ADA036  187 


! SPEFRY^UNIVAC 

“ COMPUTER  SYSTEMS 

PROSODIC  AIDS  TO 
SPEECH  RECOGNITION: 

IX.  ACOUSTIC-PROSODIC  PATTERNS 
IN  SELECTED  ENGLISH  PHRASE 
STRUCTURES 


by 

WAYNE  A.  LEA 


DEFENSE  SYSTEMS  DIVISION 
ST.  PAUL,  MINNESOTA 
(612)  466-2434 


Final  Technical  Report  Submitted  To: 


ADVANCED  RESEARCH  PROJECTS  AGENCY 
1400  WILSON  BOULEVARD 
ARLINGTON,  VIRGINIA  22209 

Attention:  DIRECTOR,  IPTO 


D D C 


31  December  1978 
Report  No.  PX11963 


DISTRIBUTION 

Approved  for  public  * ’ u ,o 
Distribution  Unlimit  i 


This  research  was  supported  by  the  Advanced  Research  Projects  Agency  of  the  Department  of  Defense 
under  Contract  No.  DAHC  1 5-73-C-03 10,  ARPA  Order  No.  2010.  The  views  and  conclusions  contained  iri 
this  document  are  those  of  the  author,  and  should  not  be  interpreted  as  necessarily  representing  the  official 
policies,  either  expressed  or  implied,  of  the  Advanced  Research  Projects  Agency  or  the  U.S.  Government. 


Final  Technical  Report  Submitted  To: 


ADVANCED  RESEARCH  PROJECTS  AGENCY 
1400  WILSON  BOULEVARD 
ARLINGTON,  VIRGINIA  22209 

Attention:  DIRECTOR,  IPTO 


31  December  1976 


Report  No.  PX  i1963 


D D C 

lEcamoEin 


MAR  1 1977 


[EBEUTTEu 

D 


This  research  was  supported  by  the  Advanced  Research  Projects  Agency  of  the  Department  of  Defense 
under  Contract  No.  DAHC  1 5-73-C-0310,  ARPA  Order  No.  2010.  The  views  and  conclusions  contained  in 
this  document  are  those  of  the  author,  and  should  not  be  interpreted  as  necessarily  representing  the  official 
policies,  either  expressed  or  implied,  of  the  Advanced  Research  Projects  Agency  or  the  U.S.  Government. 


I 


DISTRIBUTION  STATEMENT  A 

Approved  for  public  release; 
Distribution  Unlimited 


I 


Report  No.  PI  11 963 


UNIVAC 


PREFACE 

This  is  the  ninth,  and  last,  in  a series  of  reports  on  Prosodic  Aids  to 
Speech  Recognition.  The  previous  reports  appeared  as  follows: 


I. 

Basic  Algorithms  and  Stress  Studies 

1 October,  1972 

PI 

794-0 

1 

II. 

Syntactic  Segmentation  and  Stressed 
Syllable  Location 

15  April,  1973 

PI 

10232 

• 

i 

l 

III. 

Relationships  Between  Stress  and 
Phonemic  Recognition  Results 

21  September,  1973 

PI 

10430 

IV. 

A General  Strategy  for  ProBodically- 
Guided  Speech  Understanding 

29  March,  1974 

PI 

10791 

V. 

A Summary  of  Results  to  Date 

31  October.  1 97^ 

PI 

11087 

\ 

\ 

VI. 

Timing  Cues  to  Linguistic  Structure 
and  Improved  Computer  Programs 

31  March,  1975 

PI 

11239 

VII. 

Experiments  on  Detecting  and  Locat- 
ing Phrase  Boundaries 

14  November,  1975 

PI 

11534 

* 

\ 

VIII. 

Listeners*  Perceptions  of  Selected 
English  Stress  Patterns 

21  June  1976 

PI 

11711 

This  research  was  supported  by  the  Advanced  Research  Projects  Agency  of  the 
Department  of  Defense,  under  Contract  No.  DAHC1 5-73-0-031 0,  ARPA  Order  No.  2010. 

The  views  and  conclusions  contained  in  this  document  are  those  of  the  author 
and  should  not  be  interpreted  as  necessarily  representing  the  official  policies, 
either  expressed  or  implied,  of  the  Advanced  Research  Projects  Agency  or  the 
U.  S.  Government. 

I would  like  to  acknowledge  the  help  during  various  phases  of  our  project 
of  the  following  other  Sperry  Uni vac  researchers: 

Dr.  Mark  Medress 
Dr.  Timothy  Diller, 

Dean  Kloker , and 
Toby  Skinner. 


I 


ii 


f 





Report  No.  PX  119^3 


UNIVAC 


SUMMARY 

For  four  and  one  half  years,  Sperry  Univac  has  been  engaged  in  research 
on  prosodic  structures  and  their  use  in  speech  understanding  systems,  as  a 
part  of  the  large  ARPA  Speech  Understanding  Research  (SUR)  program.  This, 
the  final  report  on  the  effort,  describes  our  two  most  recent  studies  and 
summarizes  the  results  of  our  entire  work  for  ARPA. 

One  of  our  two  major  accomplishments  during  this  final  reporting  period 
was  to  complete  our  aspects  of  a cooperative  study  with  Bolt  Beranek  and 
Newman  (BBN),  to  develop  prosodic  aids  for  the  parser  in  the  BBN  "Hear  What 
I Mean"  (HWIM)  speech  understanding  system.  We  marked  every  transition  arc 
in  the  SMALLGRAM  grammar  used  in  the  HWIM  system,  indicating  whether  the 
word  associated  with  that  point  in  generated  syntactic  structures  would  or 
would  not  be  immediately  preceded  by  a phrase  boundary  detected  from  the 
fundamental  frequency  (Fo)  contours  associated  with  the  speaking  of  that 
structure.  Previous  research  had  shown  that  the  first  stressed  syllable  of 
major  syntactic  phrases  in  spoken  sentences  would  be  immediately  preceded  by 
a fall-rise  "valley"  in  Fo,  so  that  the  first  stressed  words  of  a phrase 
should  be  marked  in  the  grammar  as  expected  to  be  preceded  by  a boundary. 

We  proposed  to  BBN  that  procedures  be  implemented  that  would  increase 
the  score  on  words  hypothesized  by  the  HWIM  system  if  those  words  were  expected 
to  be  preceded  by  a boundary  and  application  of  the  Fo-based  phrase  boundary 
detector  showed  a boundary  occurring  just  before  the  hypothesized  position 
of  that  word  in  the  utterance.  Thus  if  a boundary  was  detected,  those  words 
were  rewarded  (increased  in  score)  that  predicted  a boundary  at  that  point. 

If  a boundary  was  expected  before  a hypothesized  word  (that  is,  the  word 
was  marked  in  the  grammar  as  one  expected  to  be  immediately  preceded  by  a 
boundary),  but  no  boundary  was  detected  in  the  Fo  data,  the  score  on  that 
hypothesized  word  would  be  reduced  substantially.  In  essence,  if  you 
expect  a boundary,  you  better  get  it,  and  if  you  do,  you  should  try  that 
word  or  phrase  hypothesis  earlier  than  ones  whose  expected  prosodies  disagree 
with  the  detected  prosody. 


1 


iii 


Report  No.  PX  11 963 


ULTVAC 


To  test  this  specific  concept  of  prosodie  aids  to  parsing,  we  processed 
a total  of  sixteen  sentences  through  our  prosodic  analysis  tools,  and  com- 
pared the  detected  phrase  boundaries  with  BBN's  control  and  parsing  traces 
of  HWIM's  attempts  to  understand  those  sentences.  BBN's  latest  method  of 
hypothesis  scoring,  based  on  a shortfall  density  function,  had  to  be 
accomodated.  Also,  we  needed  to  help  BBN  define  exactly  when  a boundary 
may  be  said  to  "immediately  precede"  a hypothesized  word.  We  found  that, 
for  all  the  sentences,  the  prosodic  adjustments  in  scores  on  one-word 
and  multiple-word  theories  would  help  correct  word  sequences  to  be  tried 
before  erroneous  ones.  Thus,  prosodies  clearly  seemed  to  offer  frequent 
chances  of  helping  the  parser. 

BBN  consequently  implemented  procedures  for  using  the  marked  arcs  of 
the  grammar,  the  Fo-detected  phrase  boundaries,  and  the  adjustrent  of  scores 
on  hypothesized  word  sequences  or  "theories".  Unfortunately  their  ARPA  project 
ended  before  these  procedures  could  be  tested  to  confirm  the  value  of  the 
prosodic  aids  to  parsing.  Still,  we  believe  that  our  hand  analyses  showed 
real  promise  of  intonational  phrase  boundaries  guiding  the  parser  to  correct 
parses. 

One  other  major  accomplishment  during  these  last  few  months  was  to 
analyze  the  acoustic  prosodic  patterns  in  255  our  carefully-designed  data 
base  sentences.  Perceived  stress  patterns  for  those  sentences  had  already 
been  obtained  in  our  previous  studies.  We  found  that  over  91%  of  the 
syllables  were  correctly  detected,  despite  the  difficult  all-sonorant  sequences 
in  many  of  these  sentences.  Also,  76%  of  the  expected  phrase  boundaries  were 
correctly  detected,  and  92%  of  the  perceived  stresses  were  correctly  located. 
These  were  very  encouraging  results.  However,  also  of  interest  were  the 
regularities  in  boundary  placement,  stress  assignment,  and  intimation  found 
from  these  studies  of  sentences  with  minimally- contrastive  structures. 

We  found  that,  following  a phrase  with  a stress,  boundaries  occurred 
before  the  first  stresses  of:  noun  phrases;  sentence  adverbs;  conjuncts, 
relative  clauses;  parantheticals;  main  verbs;  and  auxiliaries  with  stresses 
(e.g.,  containing  a negative).  Boundaries  also  occur  before  stressed 
prepositions  and  between  the  parts  of  a compound  noun.  Boundaries  were 


Report  No.  PX  11 963 


UNIVAC 


more  reliably  detected  if  unstressed  syllables  intervened  between  the  last 
stress  of  the  previous  phrase  and  the  first  stress  of  the  following  phrase. 
Unexpected  boundaries  regularly  occurred  before  (unstressed)  relative  pro- 
nouns, and  after  the  inverted  auxiliary  in  yes/no  questions  (before  NP  subjects 
or,  if  the  subject  was  a pronoun,  before  the  main  verb).  About  Q%  of  all 
detected  bo  ndaries  were  false,  caused  by  Fo  variations  near  obstruents  and 
utterance  -initial  and  final  variations  in  Fo  that  can  be  ruled  out  as  wrong 
places  for  boundaries. 

We  learned  from  those  controlled  studies  that  99f>  of  all  first  stresses 
in  sentences  occur  just  before  the  Fo  peak  of  the  sentence.  The  stress  location 
algorithm  can  be  significantly  improved  by  incorporating  this  fact.  Many 
errors  (missed  stresses  and  unstresses  falsely  called  stresses)  resulted  from 
failures  to  separate  two  short  syllables,  so  they  look  like  one  long 
("stressed")  syllable.  The  test  for  prepausal  stresses  also  needs  some 
refinement.  While  there  is  room  for  improvement  in  the  stress  location 
program,  it  in  general  is  giving  good  results. 

An  overview  of  all  our  work  for  ARPA  over  the  last  four  and  one  half 
years  is  given  in  section  4 of  this  report.  In  brief,  we  hav9  provided 
extensive  evidence  for  the  value  of  prosodies  in  speech  understanding 
systems,  have  cooperated  in  many  common  tasks  of  the  total  SUR  program, 
have  provided  programs  and  ideas  for  the  use  of  prosodies  in  speech 
understanding  systems,  and  have  conducted  a series  of  experiments  to  learn 
regularities  about  the  whole  gamut  of  prosodic  structures.  Rather  than 
reiterate  that  extensive  list  of  contributions  here,  I recommend  the  reader 
read  Table  VII  (page  4l  to 43  ) at  this  time.  It  presents  most  of  our  major 
accomplishments  in  an  organized  manner. 

While  the  experimental  results  from  our  studies  are  many,  the  major 
conclusions  are  simply  stated.  We  have  shown  that  prosodic  features  can 
be  effectively  used  to  locate  regions  of  phonetic  reliability,  guide  phonological 
analyses,  aid  word  matching,  and  provide  parsing  and  higher  level  linguistic 
analyzers  with  independent  information  about  the  structures  of  spoken 
sentences.  Prosodies  show  promise  of  providing  major  improvements  in  speech 
understanding,  but  no  complete  test  of  their  effectiveness  has  been  accomplished, 
and  much  is  yet  to  be  learned  about  the  prosodic  regularities  of  English  that 


v 


Report  No.  PX  11 963 


UNIVAC 


may  aid  speech  understanding.  Further  work  is  needed,  and  we  have 
provided  the  analysis  tools,  experimental  designs,  carefully  designed 
speech  databases,  hypotheses,  and  program  plans  that  could  make  such  work 
possible.  There  is  no  question  but  that  prosodic  analysis  should  play  an 
important  role  in  future  speech  understanding  systems. 


vi 


Report  No 

. px  11963 

UNIVAC 

TABLE  OF  CONTENTS 

Page 

PREFACE 

ii 

SUMMARY 

iii 

TABLE  OF 

CONTENTS 

vii 

1 . 

INTRODUCTION 

1 

2. 

PROSODIC  AIDS  FOR  THE  BEN  PARSER 

3 

2.1 

Previous  Results 

3 

2.2 

Further  Work  Completed  During  This  Reporting  Period 

10 

3. 

ACOUSTIC-PROSODIC  PATTERNS  IN  255  SENTENCES 

14 

3-1 

Previous  Determination  of  Sentences  and  Perceived  Stress 
Patterns 

14 

3-2 

Improved  Methods  for  Acoustic-Prosodic  Processing 

15 

3.3 

Intonationai  Phrase  Boundaries 

19 

3.4 

Syllabification  and  Automatic  Location  of  Stresses 

28 

3.5 

Testing  Intonationai  Hypotheses 

33 

3.6 

* 

acoustic  Correlates  of  Stress 

38 

3-7 

Specific  Implications  for  Speech  Understanding  Systems 

38 

4. 

REVIEW  OF  PROSODICS  RESEARCH  PROGRAM 

40 

4.1 

Overview 

40 

4.2 

Defining  the  Role  of  Prosodies  in  Speech  Understanding 
Systems 

44 

4-3 

Cooperative  Efforts  to  Advance  the  Development  of  Speech 
Understanding  Systems 

46 

4.4 

Intonation  and  Phrase  Boundaries 

48 

4.5 

Perceived  Stress  Patterns 

53 

4.6 

Automatic  Location  of  Stresses 

55 

4.7 

Timing  Cues  to  Linguistic  Structures 

57 

4.6 

Carefully  Designed  Speech  Databases 

59 

5- 

CONCLUSIONS  AND  FURTHER  STUDIES 

63 

5.1 

Conclusions 

63 

5-2 

Further  Studies 

66 

6. 

REFERENCES 

69 

7- 

APPENDICES 

72 

APPENDIX  A: BOUNDARIES  AND  STRESS  PATTERNS  IN  THE  255  DATABASE 
SENTENCES 

73 

APPENDIX  B: PRESENTATIONS  AND  PUBLICATIONS  BY  SPERRY  UNIVAC 
DURING  THE  ARPA/SUR  PROGRAM 

101 

vii 


Report  No.  PX  11 963 


UNIVAC 


1 . INTRODUCTION 

This  is  the  nineth,  and  last,  in  a series  of  reports  on  Sperry  Univac's 
contracts  under  the  Speech  Understanding  Research  program  sponsored  by  the 
Advanced  Research  Project  Agency  (ARPA) . Sperry  Univac's  research  is  concerned 
with  extracting  prosodic  information  from  the  acoustic  waveform  of  connected 
speech  (sentences  and  discourses),  and  using  that  prosodic  information  to  detect 
phrase  boundaries,  locate  stressed  syllables,  determine  rhythm  and  rate  of 
speech,  and  apply  such  prosodic  features  to  guiding  word  matching,  syntactic 
parsing,  and  semantic  analyses. 

Sperry  Univac's  research  has  been  basically  two-pronged:  (l)  developing 

prosodic  analysis  tools  and  providing  other  services  to  speech  understanding 
systems  builders and  (2)  conducting  experiments  suitable  for  determining  ex- 
actly how  prosodic  patterns  relate  to  sentence  structures.  The  experiments 
have  a definite  practical  goal  in  mind,  however.  They  are  intended  to  pro- 
vide adequate  understanding  so  that  prosodic  tools  can  be  implemented  and 
improved  in  such  a way  as  to  provide  substantial  aids  to  other  aspects  of  the 
speech  understanding  process. 

In  Section  U of  thic  report,  the  reader  will  find  a review  of  all  the 
research  done  by  Sperry  Un^vac  to  provide  prosodic  aids  to  speech  understanding 
and  conduct  research  on  prosodic  structures.  Before  we  get  to  that  review, 
however,  we  must  first  consider  tho  final  two  experimental  studies  just  com- 
pleted in  this  program.  In  this  last  half  year,  we  have  completed  our  study 
of  how  intone tionally-detec ted  phrase  boundaries  might  be  used  in  the  Bolt 
Beranek  and  Newman  HWIM  ("Hear  What  I Mean")  system.  As  described  in  Section  2, 
the  most  recent  procedures  for  scoring  alternative  syntactic  theories  in  the 
HWIM  system  were  considered,  the  latest  form  of  the  grammar  was  marked  for  pro- 
sodic expectations,  and  a set  of  ten  additional  sentences  were  processed  to  deter- 
mine whether  detected  pbra3e  boundaries  could  help  the  HWIM  parser.  In  the 
last  days  of  their  SUR  project,  BBN  was  not  able  to  test  our  ideas  'as  we  had  hoped, 
but  our  analyses  do  suggest  gc  potential  for  prosodic  information  to  aid  in 
obtaining  correct  parses  of  sentences. 

In  Section  3>  we  report  on  the  acoustic-prosodic  patterns  found  in  255 
representative  sentences  taken  from  our  3300-eentence  speech  data  base.  With 
these  carefully  controlled  sentence  structures,  we  were  able  to  determine  '.arious 


1 


Report  Ho.  PI  11 963 


UHTVAC 


prosodic  regularities  that  accompany  various  English  phrase  structures. 

Several  questions  still  remain,  however,  and  further  tests  are  needed. 

Following  the  review  in  section  4,  section  5 covers  our  general  conclu- 
sions and  recommendations  for  further  work.  Our  recent  studies  of  prosodic 
patterns  in  controlled  English  texts  have  shown  the  great  value  of  the  speech 
data  base  and  the  need  to  test  a variety  of  hypotheses  about  prosodic  patterns. 
Our  applications  of  prosodic  analysis  tools  to  guiding  speech  understanding 
systems  have  shown  the  need  for  further  research  on  the  direct  use  of  proBodic 
information  in  speech  understanding  systems.  Ideas  for  furthering  Buch  work 
are  suggested. 

References  in  section  6 are  followed  by  a listing  in  Appendix  A of  the 
sentences  and  their  prosodic  structures,  and,  in  Appendix  B,  a listing  of  all 
the  presentations  and  pub! icat  ons  completed  at  Sperry  Uni vac  within  the  ARPA 
SUR  program. 


2 


± 


llilWmiWM|i  linuiNiorU 


Report  No.  PX  11 963 


r wac 


J 


2.  PROSODIC  AIDS  FOR  THE  DBN  PARSER 


2.1  Previous  Results 

In  our  previous  semiannual  report  (Lea,  1976c),  we  reported  on  an 
initial  experiment  to  develop  means  of  using  prosodic  information  to  help 
guide  the  parsing  and  control  components  of  the  BBN  HWIM  System.  This 
study  was  an  attempt  to  show  on  paper  that  prosodies  can  be  of  some  use,  even 
before  a specific  procedure  is  coded  and  tested  within  the  HWIM  system.  It 
appeared  that  some  aid  car  be  provided,  but  tbs  ultimate*  success,  in  obtaining 
more  correct  parses  or  more  rapid  and  efficient  parses,  awaited  further  testing 
and  implementation,  which  we  hoped  to  accomplish  in  this  last  six  months,  with 
BBN's  cooperation.  Let  me  first  summarize  where  we  were,  and  then  outline  our 
further  work.  I will  follow  the  form  of  presentation  used  in  describing  this 
work  at  the  92nd  Meeting  of  the  Acoustical  Society  of  America  (Lea,  19?6d). 

While  several  possible  ways  of  using  prosodies  in  parsing  and  the  control 
strategy  of  a speech  understanding  system  were  suggested,  time  pressures  in 
the  ARPA  program  forced  us  to  try  only  a limited  form  of  prosodic  aid  to 
parsing.  We  had  never  before  tested  the  idea  of  using  prosodic  information 
to  directly  ail  parsing,  so  we  confined  our  efforts  in  this  initial  study  to 
the  use  of  one  simple  prosedic  feature,  namely,  intonationally-detected  phrase 
boundaries. 

Most  readers  may  be  aware  that  we  have  previously  developed  a computer 
program  to  detect  boundaries  between  major  syntactic  constitutents,  using  fall- 
rise  valleys  in  fundamental  frequency.  Figure  1 shows  a typical  example  of 
how  this  program  works,  using  one  of  BBN's  travel  management  sentences.  The 
program  finds  phrase  boundaries  at  the  valleys  in  fundamental  frequency  shown 
by  the  vertical  lines,  before  the  words  "Lynn's"  and  'ASA''.  This  program  has 
previously  been  found  to  correctly  detect  about  QQ%  of  the  major  syntactic 
boundaries  in  connected  speech.  Recently  (Lea,  1975b), we  established  that  the 
location  of  the  detected  boundary  is  just  before  the  first  stress  in  the  follow- 
ing constituent,  regardless  of  whether  that  stress  is  in  phrase-initial  position 
or  not.  One  problem  in  using  intonationally-detected  phrase  boundaries  is  that 
this  position  may  not  line  up  with  the  exact  location  of  the  boundary  between 
phxv.jes  as  predicted  by  syntax.  We  were  able  to  get  around  this  problem  by 
associating  the  boundary  not  with  the  first  word  in  a phrase,  but  rather  with 
the  first  stress  in  the  phrase.  To  see  how  this  works,  we  must  first  consider 


3 


JL 


Report  No.  PX  II963 


UNIVAC 


how  intonationally-detected  phrase  boundaries  can  be  introduced  into  the  overall 
operation  of  the  HVIM  system. 

Figure  2 shows  a simplified  diagram  of  the  relevant  aspects  of  the  HWIM 
system.  The  system  dees  a phonetic  ai  ilys_s  to  hypothesize  words  that  have 
a high  score  of  correspondence  with  the  acoustic  data.  A queue  of  high  scoring 
words,  and  their  positions,  is  given  by  way  of  the  system  control  to  the  parser. 

These  one-word  "islands"  or  theories  can  then  be  evaluated  by  the  parser,  and 
an  adjusted  list  of  likely  words  transferred  back  to  system  control.  Prosodies 
may  be  introduced  into  the  process,  by  adjusting  the  priority  order  of  promising 
words  that  the  parser  transfers  to  system  control. 

In  our  initial  experiment  last  May,  fifteen  sentences  ; re  processed  through 
Sperry  Univac's  implementations  of  a pitch  tracker  and  procedure  for  obtaining  phrase 
boundaries.  These  prosodically  obtained  boundaries  were  then  to  be  compared  with 
expected  '.oundaries,  to  see  how  well  the  prosodic  structure  of  an  utterance 
agreed  with  that  expected  for  hypothesized  word  sequences.  To  exactly  specify 
how  expected  boundaries  relate  to  various  syntactic  structures,  we  marked 
those  arcs  in  the  SMALLGRAM  grammar  whose  words  a^e  expected  to  *.  immediately 
preceded  by  Fo-detected  phrase  boundaries.  Figure  3 displays  a sarq  portion 
of  the  SMALLGRAM  grammar,  showing  by  the  crosshatched  lines  those  arcs  that 
are  expected  to  be  accompanied  by  phrase  boundaries.  This  noun  phrase  constituent 
provides  descriptions  of  meetings  which  one  might  travel  to.  Its  stressed 
beginnings  are  expected  to  be  accompanied  by  a boundary.  Thus,  to  describe  the 
phrase  "the  last  ASA  meeting",  we  take  the  arc  labeled  "the",  then  the  arc  labelled 
"last",  which  is  expected  to  be  accompanied  by  an  Fo-detected  phrase  boundary. 

Then  the  "sponsor"  arc  is  taken,  with  the  particular  sponsor  being  "ASA",  and 
finally  the  arc  labelled  "meeting"  completes  the  phrase,  and  the  grammar  "pops  up" 
to  handle  larger  phrases.  For  the  phrase  "the  ASA  meeting",  the  grammar  takes  the 
"the"  arc,  the  jump  arc,  then  the  sponsor  arc  for  "A"!",  which  should  be  accompanied 
by  a boundary,  and  again  the  "meeting"  arc.  Other  s ressed  beginnings  of  the 
noun  phrase  shown  at  the  bottom  of  the  figure  are  aloe  expected  to  be  marked  by 
boundaries.  These  include  a sponsor  like  "AEPA"  in  a sentence  like  "John's  going 
to  ARPA",  or  a proper  noun  for  a destination  like  "Sperry  Uni vac". 

In  general,  those  arcs  of  SMALLGRAM  which  were  marked  by  expected  boundaries 
included  ones  labelled  with  main  verbs,  the  first  stressed  words  in  noun  phrases 


5 


Report  No.  PI  11 963 


UNIVAC 


5 
© 

6 


6 


Figure  2.  Simplified  Schematic  of  How  Phrase  Boundaries 
Guide  the  HWIM  Parser. 


Figure  3.  A Sample  Portion  of  the  SMALLGRAM  Grammar  Showing  by 
Crosehatching  These  Arcs  that  are  Expected  to  be 
Accompanied  by  Intonational  Boundaries 


Report  No.  PI  11 9^3 


UNIVAC 


(i.e.,  quantifiers,  adjectives,  and  nouns,  when  in  first-stress  positions); 
adverbs,  and  some  special  categories  (like  city,  month,  duration,  digit,  first 
name,  etc.)  thai  are  used  in  BBN's  travel-management  task.  My  previous  exper- 
iments had  shown  that  Fo-detected  boundaries  would  occur  immediately  before  the 
first  stressed  syllable  of  such  constituents. 

Only  about  one  out  of  every  six  arcs  in  the  grammar  is  expected  to  be  pre- 
ceded by  an  Fo-detected  boundary  (Lea,  1976c,  p.  10),  yet  at  least  once  or  twice 
in  almost  every  sentence  a boundary  should  occur,  so  that  prosodic  information 
should  be  helpful  in  almost  every  sentence. 

Given  that  arcs  in  the  grammar  are  marked  if  they  are  expected  to  be  pre- 
ceded by  an  Fo-boundary,  we  then  need  to  determine  whether  the  detected  prosodic 
pattern  for  a sentence  does  or  does  not  agree  with  the  expectations  about  boundary 
placements,  and  to  use  that  prosodic  evaluation  as  a guide  as  to  whether  or  not 
certain  arcs  should  be  used  at  an  ear?y  stage  in  parsing.  For  our  initial  test 
last  May  BBN  supplied  complete  control  and  parsing  traces  for  seven  of  the 
fifteen  sample  sentences  recorded  for  their  travel-management  task.  Ve  processed 
all  fifteen  sentences  through  our  fundamental  frequency  tracker  and  obtained  Fo- 
detected  phrase  boundaries,  but  could  relate  those  results  to  syntax  in  only 
the  seven  sentences  for  which  control  and  parsing  traces  were  available.  If 
an  arc  was  marked  to  be  preceded  by  a boundary,  and  a boundary  was  detected, 
then  the  priority,  or  "score",  for  the  word  on  that  arc  was  increased  substan- 
tially (+50  points)  whereas  an  expected  boundary  that  was  not  detected  caused 
the  score  of  the  word  on  that  arc  to  be  substantially  decreased  (-50  points). 

Ve  thus  reward  theories  of  hypothesized  words  when  those  theories  should  be 
and  are  preceded  by  phrase  boundaries,  and  we  decrease  the  priority  of  words 
or  theories  that  predict  the  presence  of  boundaries  that  are  not  detected. 

Within  the  BBN  System,  the  adjusted  score  determines  the  priority  of  trying  that 
theory  in  the  subsequent  analysis,  with  highest-score  theories  tried  first. 

Figure  1 displays  the  major  reordering  which  prosodic  boundaries  car.  pro- 
duce, for  initial  one  word  theories  provided  to  syntax  by  the  system  control. 

For  this  one  sentence  ("Charge  Bonnie's  trip  to  budget  item  five."),  the 
initial  list  of  priorities  or  scores  for  one-word  theories  as  provided  by  the 
original  system  is  shown  on  the  left,  while  its  reordered  form  after  prosodic 
adjustments  is  on  the  right.  Notice  that  several  instances  of  the  correct  words 
are  moved  up  the  list,  so  that  four  of  the  top  six  theories  are  now  versions 


8 


isn  TVIIIMI 


Report  No.  PX  11 963 


c/z 

CO 

Ll. 

CO 

> 

LU 

QC 


CM 
1— I 
CM 


IM 


LH 

O 

Q 

LU 


OT 


LO 

CM 


ro 

LO 


1 

LU 

1 

1 

LO 

CM 

LU 

CM  LU 

CO 

LO 

ro 

QC 

y 

<C 

<c 

1 1 | 

Q_, 

3= 

h— 1 

a 

1 

LU 

53 

_ 

• 

m 

•—i 

CM 

LPl 
LPi 
<— I 

LPl 

O 

LU 

LO 

oc 

<C 

a: 

cu 


LPl  LO 


or 


in 

0 

a 

LU 

1 
I 

LU 

LO 

QC 

<C 


LPl 

CM 


Cn 
CM 
r— I 

or 

o 

QC 

<c 

< ) 


00 

CO 

LH 

<“H 

or 

cn 

0 

p~i 

CM 

•—4 

CM 

<■*-) 

O 

«— i 
•—i 

or 

or 

OT 

LU 

1 

or 

0 

CM 

CM 

LU 

LO 

0 

QC 

LU 

QC 

ac 

<C 

QC 

LU 

QC 

=3 

=c 

LU 

in 

<5 

O 

LU 

t— 

cn 

O 

>— -t 

CM 

rH 

r- H 

r—i 

■—i 

CT> 

o 


or 
o 

LU 
N—  QC 


LTi 

O 

CM 

LO 

CM 

or 

CM 


in 


cn 
CM 
*■— ( 

■or 

o 

qc 

<5; 

LU 


LH 

CM 


Nn 


cn  Nn  or  in  lo 


Ln 

- 

1 

1 cn 
0 

cn 

0 

00 

CM 

i- ( 

OO 
CM 
>— H 

i- 1 

a 

LU 

or 

•—i 
>— ( 

O r-i 

>— 1 

or 

LO 

>— -t 

or 

or 

or 

1 

LU 

or 

LU 

0 

0 

CM 

CM 

LO 

0 

>— < 

QC 

LU 

LU 

QC 

QC 

<C 

QC 

| 

LU 

QC 

LU 

sg 

^3 

m 

LU 

O 

3T 

m 

O 

LU 

cc 

oq 

h- 

fM 

OO 

cn 

10. 

11. 

12. 

13. 

00 

o 


LO 

CM 

N"\ 

CM 


a 


LO 

O 

r-( 

LO 

CM 


CM 


LH 


DNIVAC 


Figure  4.  Prosodic  Rearranging  of  the  Queue  of  Syntactic  Events 


Report  Ho.  PI  11 963 


TJHIVAC 


of  correct  words.  The  words  are  followed  by  two  numbers  describing  their 
beginning  and  ending  positions,  followed  by  the  word  score.  For  the  words 
that  moved  up  the  list  the  scores  increased  50  points  due  to  prosodic  agreement 
with  expectations . The  reordering,  which  is  typical  of  the  kinds  of  contributions 
prosodies  can  supply,  permits  correct  theories  (such  as  the  words  "charge", 

"item",  and  "Bonnie")  to  be  tried  at  earlier  stages  in  the  analysis.  Testing  of 
erroneous  words  like  "car,  our,  her,  their,  etc."  will  be  delayed  by  the 
prosodic  rearrangement  of  theories,  so  that  false  (misleading)  paths  can  be 
avoided . 

Also,  at  later  stages,  other  good  multiple- word  theories  "or  phrases"  are 
boosted  by  knowing  boundary  placements.  For  example,  at  a later  stage  in  the 
analysis,  the  word  sequence  "Charge  Bonnie"  was  boosted  because  of  the  correct 
occurrence  of  an  expected  boundary  before  "Bonnie".  Likewise,  "item  five"  and 
"budget  item"  were  boosted,  by  prosodic  agreements. 

Analysis  of  all  seven  control  traces  supplied  by  BBN  showed  many  instances 
where  boundary  placements  could  reorder  the  selection  of  theories,  and  thereby, 
the  prosodies  could  direct  analysis  toward  correct  islands  first,  potentially 
saving  computations  on  lower-score  erroneous  theories,  and  more  directly  reaching 
the  goal  of  a correct  parse.  Without  an  index  to  the  grammar  (to  know  everywhere 
in  the  grammar  each  word  could  occur)  and  a complete  list  of  theories  (not  just 
the  top  15  such  as  were  given  on  the  available  traces  at  each  point  in  a parse), 
we  could  not  be  completely  sure  of  when  prosodies  might  make  a parse  succeed 
where  it  had  previously  failed.  Also,  without  a full  understanding  of  the  timing 
of  various  processes,  we  couldn't  tell  exactly  what  amount  or  proportion  of  time 
could  be  saved  by  the  use  of  boundary  locations.  let,  it  seemed  clear  that  many 
time-consuming  tests  of  erroneous  paths  would  be  eliminated  or  delayed  until 
after  better  paths  were  analyzed  when  prosodies  were  used.  The  final  test  would 
come  when  the  proposed  procedures  for  prosodic  aids  were  implemented  and  tested 
in  a total  system.  Parsing  and  control  traces  with  the  use  of  prosodies  could  be 
compared  with  traces  when  prosodic  information  is  not  used. 

2.2  Further  Work  Completed  During  This  Reporting  Period 

Our  analyses  of  how  prosodies  would  rearrange  the  theories  in  available 
control  and  parsing  traces  clearly  confirmed  the  value  of  incorporating  prosodic 


10 


Report  No.  PX  11 963 


UNIVAC 


information  into  speech  understanding  systems.  We  then  began  a renewed  effort 
to  cooperate  with  BBN  in  incorporating  these  ideas  into  their  system,  for  actual 
tests  of  whether  parsing  is  more  successful  with  prosodic  information  than 
without  it.  Our  hope  was  that,  before  the  1976  version  of  the  BBN  HWIM  system 
was  to  be  demonstrated  in  September,  we  would  have  implemented  and  tested 
within  that  system  all  the  procedures  needed  to  use  prosodic  boundaries  to 
aid  the  parser. 

We  received  from  BBN  an  updated  version  of  the  SMALLGRAM  grammar,  a grammar 
index  to  establish  everywhere  each  word  appeared  in  the  grammar,  and  control  and 
parsing  traces  for  ten  additional  sentences.  We  marked  every  arc  in  SMALLGRAM, 
showing  one  of  four  degrees  of  confidence  that  boundaries  would  occur  with  each 
of  those  arcs.  We  checked  and  refined  these  predictions,  based  on  the  additional 
evidence  of  detected  boundaries  in  our  prosodic  processing  of  nine  of  the  ten 
additional  sentences.  With  BEN,  we  decided  that  quality  scores  on  theories 
would  be  changed  + 50  points  for  those  words  that  had  the  highest  confidence 
level,  and  + 20  points  for  the  next  highest  confidence  level.  These  changes  in 
quality  scores  on  words  would  then  be  reflected  in  subsequent  calculations  of 
shortfall  densities  within  their  new  scoring  strategy.  We  provided  to  BBN  a 
specific  assessment  of  how  our  prosodic  programs  had  changed  since  they  received 
a version  over  a year  ago,  and  recommended  minor  changes  in  their  versions  to 
bring  them  up  to  date. 

We  recommended  to  BBN  some  necessary  changes  in  the  grammar.  For  example, 
some  arcs  previously  had  stressed  words  like  "costs"  on  the  same  arc  as 
unstressed  words  ("is",  "was"),  but,  since  boundaries  would  precede  only  the 
stressed  words,  the  words  should  be  separated  onto  two  parallel  arcs.  Other 
changes  in  the  grammar  were  needed  to  handle  context-sensitive  cases  where  a 
boundary  occurred  before  a word  if  that  word  was  not  preceded  by  another  (stressed) 
word  that  would  pull  the  boundary  earlier.  Thus,  for  example,  if  an  utterance 
includes  the  word  sequence  "the  current  budget",  then  a boundary  occurs  only 
before  "current"  (with  none  before  "budget"),  but  if  the  utterance  has  the 
"the  budget",  then  the  boundary  does  occur  before  "budget". 

In  July,  we  supplied  all  the  new  prosodic  results,  the  completely  marked 
grammar,  the  updated  scoring  adjustment  procedures,  end  the  grammar  changes,  to 
BBN.  We  also  supplied  copies  of  portions  of  the  traces  for  the  nine  sentences, 


11 


Report  No.  PI  11963 


UNTVAC 


shoving  the  reordering  that  would  occur  if  + 50  point  adjustments  in  quality 
scores  were  made  where  boundaries  were  expected  and  found  or  not  found.  The 
initial  one-word  scans  or  queues  were  significantly  improved  by  the  introduction 
of  boundary  information,  with  more  correct  words  moved  to  high  priority  positions 
in  the  lists,  and  many  incorrect  words  reduced  in  priority.  A few  erroneous 
words  were  increased  in  priority. 

Several  subsequent  queues  of  events  with  multiple-word  theories  were  bIbo 
provided,  shoving  increased  priority  for  correct  theories,  although  the  exact 
reordering  for  multiple-word  theories  would  require  full  calculations  of  MA2SEGS, 
shortfall,  and  shortfall  density  values  UBed  in  the  new  BBN  scoring  procedures. 
These  later  queues  from  parsing  traces  also  graphically  illustrated  how  elimin- 
ation of  one  wrong  theory  from  the  early  stages  of  analysis  could  eliminate 
many  wrong  theories  in  later  queues. 

One  other  critical  task  was  to  precisely  define  the  time  limits  for 
location  of  boundaries  that  may  be  said  to  "immediately  precede"  a narked  arc 
of  the  grammar.  Since  our  previous  research  had  shown  that,  with  rare  exceptions, 
the  Fo-detected  boundary  was  located  immediately  before  the  first  stressed 
syllable  in  a phrase,  BBN  (specifically  Jerry  Wolf),  with  Sperry  Uni vac's  advise, 
developed  the  following  procedure  for  relating  syllable  boundaries,  word 
boundaries,  and  detected  phrase  boundaries.  First,  each  detected  phrase  boundary 
iB  "smeared"  over  the  two  closest  syllable  nuclei,  by  bracketing  the  boundary 
position  between  the  beginning  of  that  syllable  whose  nucleus  center  is  immed- 
iately before  the  detected  time  and  the  ending  of  that  next  syllable,  whose 
nucleus  center  just  follows  the  boundary  time.  This  thus  requires  that  the 
system  not  only  use  the  Fo-based  boundary  detection,  but  also  the  syllabification 
routine  ("CHUNK").  Next,  (with  one  exception  soon  to  be  described)  the  word 
being  hypothesized  by  the  system  is  said  to  be  immediately  preceded  by  that 
Fo-detected  phrase  boundary  if  the  time  of  the  beginning  of  the  word  lies  within 
the  phraBe-boundary-smeared  region  (between  the  one  syllable  onset  and  the  end 
of  that  next  syllable).  If  the  word  is  not  stressed  on  its  first  syllable  in 
the  lexicon,  then  a match  of  boundary  location  and  the  word  location  is  considered 
to  be  acceptable  aB  long  as  there  is  any  overlap  of  the  time  the  word  spans 
end  the  time  the  smeared  boundary  Bpans . 


12 


Report  No.  PX  11 963 


UNIVAC 


All  of  the  above  ideas  were  implemented  within  the  BBN  system,  but  time 
pressures  at  the  end  of  BBN's  ARPA  contract  unfortunately  did  not  permit  testing 
out  these  prosodic  aids  to  parsing.  The  final  test  would  have  come  if  parsing 
and  control  traces  with  the  use  of  prosodies  could  be  compared  with  traces  when 
prosodic  information  is  £ot  used.  The  project  terminated  before  any  such  results 
were  available.  Still  it  seems  likely  from  our  hand  analyses  of  sixteen  BBN 
control  traces  that  prosodically-detected  phrase  boundaries  can  improve  the  order 
in  which  alternative  syntactic  theories  are  tested.  While  we  obviously  have 
just  begun  to  offer  specific  prosodic  aids  to  parsing  and  system  control,  I believe 
we  have  shown  enough  of  a pattern  of  improved  ordering  of  alternative  theories 
and  events  to  warrant  future  efforts  to  introduce  prosodies  into  parsing  procedures 
in  speech  understanding  systems. 


13 


Report  No.  PX  11 963 


UNIVAC 


3.  ACOUSTIC-PROSODIC  PATTERNS  IN  255  SENTENCES 

In  previous  reports  (Lea,  1973&>  1973b,  1974,  1975&>  1976b;  Lea,  Medress, 
and  Skinner.  1973a,b;  Lea  and  Eloker,  1975) > we  have  reported  on  a variety  of 
experiments  that  demonstrate  the  utility  of  prosodic  information  in  speech 
under standing  and  have  indicated  various  prosodic  regularities  such  as  the 
marking  of  phrase  boundaries  ty  Fo  valleys  just  before  the  first  stress  of  a 
phrase,  and  the  marking  of  stressed  syllables  by  a complex  combination  of 
pitch  rises  and  high  energy  integrals.  Ve  argued  that  our  initial  studies 
vith  uncontrolled  speech  texts  needed  to  be  supplemented  by  much  more  controlled 
studies  with  sentences  which  carefully  isolated  various  linguistic  contrasts  and 
their  prosodic  correlates.  A list  of  1100  sentences,  each  spoken  by  three 
talkers,  was  designed  and  recorded  to  provide  the  necessary  data  for  more  con- 
trolled studies. 

3-1  Previous  Determination  of  Sentences  and  Perceived..  Stress  Patterns 

Ve  began  our  studies  of  prosodic  regularities  in  the  database  sentences 
by  selecting  255  sentences,  spoken  by  one  talker,  as  a representative  sampling 
of  linguistic  contrasts  that  were  to  be  studied  (Lea,  1976c).  These  sentences, 
listed  again  in  Appendix  A,  include  ones  which  can  test  the  prosodic  effects 
of  moving  the  first  stress  in  a constituent  from  the  first,  to  the  second,  to 
the  third  (etc.)  syllable  (to  see  how  the  Fo-detected  phrase  boundary  was  then 
positioned);  expansions  of  noun  phrases  to  include  nouns,  pronouns,  articles, 
quantifiers,  adjectives,  and  participles;  practically- identical  word  sequences 
vith  contrasting  syntactic  structures,  verb-versus-noun  stress  pairs  (permit/ 
permit,  etc.);  NP-PP-PP  subordination,  various  sentence  types  (declarative, 

WH  question,  yes/no  question,  command);  coordinate  clauses,  VP's,  and  NP's' ; 
and  relative  clauses. 

Presented  in  our  last  report  (Lea,  1976c)  were  listeners’  perceptions 
of  stress  patterns  in  the  255  sentences.  Ve  confirmed  the  consistency  of 
listeners'  perceptions  from  time  to  time  and  from  listener  to  listener,  and 
showed  which  word  categories  are  usually  stressed,  which  are  perceived  as 
unstressed,  and  which  are  reduced.  Ve  showed  how  coordinate  structures  and 
subordination  produce  reduced  stress  on  verbs,  function  words,  and  repeated 
nouns.  Perceived  stresses  were  found  to  consistently  decrease  throughout  a 
word  sequence  without  right  syntactic  brackets,  even  though  nucleer  stress 
rules  predict  patterns  of  increasing  stress. 


* 


14 


Report  No.  PX  11963 


UNIVAC 


Ir.  the  remainder  of  section  3,  we  will  present  results  about  the  acoustic- 
prosodic  patterns  in  those  same  235  sentences.  We  will  see  how  various  prosodic 
hypotheses  stand  up  to  the  data  from  these  carefully  controlled  structural 
distinctions,  ard  what  are  the  acoustic  correlates  of  the  perceived  stress  pattern 
Some  improvements  in  the  prosodic  analysis  tools  were  needed  for  these  studies, 
and  they  are  briefly  described  in  section  3-2.  Then,  in  section  3*3>  we  will 
discuss  the  regularities  in  placement  of  Fo  valleys  associated  with  phrase 
boundaries,  and  present  the  level  of  performance  of  our  computer  programs  for 
detecting  phrase  boundaries.  These  designed  sentences  help  firmly  answer  specific 
questions  about  which  constituents  are  marked  by  Fo-detected  boundaries,  and 
where  the  boundaries  occur  (cf.  Lea,  1975b,  for  related  work  on  some  others  of 
the  database  sentences).  Section  3*^  covers  performance  of  the  stressed 
syllable  location  algorithm,  and  the  prerequisite  process  of  syllabification. 

Some  international  hypotheses  that  could  be  tested  with  the  data  are  discussed 
in  section  3-5-  More  detailed  studies  of  acoustic  correlates  of  stress  in  various 
sentence  structures  are  suggested  in  section  3*6.  Specific  implications  for 
using  these  experimentally- verified  prosodic  regularities  in  speech  understanding 
systems  are  discussed  in  section  3*7* 


3*2  Improved  Methods  for  Acoustic-Prosodic  Processing 

In  earlier  studies  (Lea,  1973f)>  we  used  a software  analysis  to  obtain 
the  sonorant  energy  function  from  the  values  of  LPC-analyzed  spectral  energy' 
in  the  band  60  Hz  to  3000  Hz.  Last  year,  we  introduced  an  analog  hardware 
filter  to  replace  those  spectral  computations  (Lea,  19756),  but  because  of  the 
noisy  signal  and  other  minor  problems  obtained  with  the  filter,  plus  recent 
improvements  in  the  speed  of  computation  of  our  spectral  analyses,  we  have 
returned  to  the  use  of  the  software-derived  sonorant  energy  function. 


Our  Fo  tracker,  which  is  based  on  an  autocorrelation  analysis  (Skinner, 

1 97 3 ) » has  been  improved  by  incorporating  a weighting  function  with  a 10$ 
slope  on  the  autocorrelation  function,  to  discourage  octave  errors.  Thus,  the 
autocorrelation  equation  was  given  by 


Ai  =lSi  °i 


'i+j-i 


from  j =0^  to  j = 0^,  where  i is  the  amount  of  offset,  N is  the  length  of  the 
window,  is  the  windowed  and  center  clipped  waveform,  ^ is  the  offset 

window  to  be  correlated  with  C^,and  0^  and  0^  are  the  lower  and  upper  autocorre- 
lation offset  limits,  respectively  (See  Lea,  Medress,  and  Skinner,  1973a,  p.  31). 


15 


Report  No.  PI  11963 


Ul.TVAC 


We  next  compute  e straight  line  approximatation  to  a logarithmic  function 
that  decreases  the  A^'s  by  VGT  percent  per  octave: 

SLOPE  = ~(Log2  °M  ~l0g2  °L)  . *$! 


Ve  are  currently  using  VTG  = 10.  Figure  5 shows  a sketch  of  the  weighting  factor 
(1  + (i-0^  + 1)  (SLOPE)),  which  is  based  on  this  logarithmic  weighting  function. 

Finally,  we  weight  the  autocorrelation  coefficient  by  that  straight  line: 

Bi  = Ai  d + (i  - °L  + D • SLOPE) 

for  each  i,  i = 0T  to  0,,.  This  causes  the  kind  of  change  in  the  autocorrelation 
function  sketched  in  Figure  5>  in  which  the  larger  offset  is  selected  as  the 
peak  in  the  autocorrelation  function  after  the  weighting,  whereas  the  smaller 
offset  would  have  been  erroneously  selected  before  the  weighting  was  applied. 

Analysis  of  the  255  sentences  showed  that  some  octave  errors  still 
occasionally  occur,  but  much  less  frequently  than  without  this  weighting  function. 

The  program  for  detecting  phrase  boundaries  from  Fo  valleys  (Lea,  1975  ), 

is  unchanged  except  for  using  a threshold  value  of  4 rather  than  5 eighth  tones 
change  in  Fo  to  have  a substantial  Fo  valley. 

Recently,  we  changed  our  syllabification  procedure  in  several  minor  ways 
and  one  major  way.  The  minor  changes  included:  requiring  all  frames  within 
a syllabic  nucleus  to  be  voiced,  disallowing  overlapping  nuclei  that  occas- 
ionally occurred;  setting  a minimum  energy  level  for  a peak  in  energy  to  qualify 
as  the  peak  of  a syllable;  smoothing  one-point  jumps  or  dips  in  sonorant 
energy  that  occasionally  caused  erroneous  syllable  detections;  and  adjusting 
the  threshold  value  for  a minimum  dip  and  riBe  in  sonorant  energy  to  qualify 
as  a syllable  boundary  (now  4 dE  rather  than  5 dB) » to  help  pick  up  some  missed 
syllables.  These  changes  helped  reduce  the  number  of  erroneous  "syllables" 
detected  and  helped  locate  some  syllables  that  had  previously  been  missed. 

The  major  change  in  the  syllabication  procedure  was  to  redefine  the  begin- 
ning and  ending  of  a syllabic  nucleus.  Previously  the  nucleus  endpoints  were 
defined  as  those  points  where  energy  first  dipped  4 or  5 dB  below  the  peak 
energy  level  in  the  syllable.  This  often  excluded  from  the  syllable  nucleus 


16 


Report  No.  PI  11 9^3 


UNIYAC 


nuch  of  the  non* vowel  Bonorant  portions  of  the  syllable,  and  even  some  of  the 
lover-energy  regions  of  the  vowel.  An  investigation  showed  that  the  syllable 
nucleus  limits  lined  up  best  with  the  endpoints  of  soncrant  phonetic  segments 
in  available  transcriptions  when  the  syllable  nucleus  limits  were  defined  as: 
those  points,  as  one  moves  either  way  from  an  energy  dip  at  a syllable  boundary, 
where  energy  first  rises  to  at  least  one  half  of  the  distance  between  the  low 
energy  in  the  dip  and  the  peak  energy  in  the  syllable  nuclei.  That  is,  if 
D(i-l)  is  the  energy  level  in  the  dip  before  a syllable  nucleus,  D(i)  is  the 
energy  level  in  the  dip  after  the  nucleus,  and  P ( i)  is  the  peak  energy  level 
within  the  nucleus,  then  the  beginning  of  the  nucleus  is  now  the  first  point 
where  energy  E satisfies  the  following  inequality: 

E > (P(i)  + D(i-1 ) )/2. 

The  endpoint  of  the  nucleus  is  the  last  point  in  time  before  the  following 
dip  at  which  energy  E statisfies  the  following  similar  inequality: 

E > (P(i)  + D(i))/2. 

A major  revision  has  been  made  in  the  nucleus  finding  subroutine  to 
accomplish  this  change,  and  it  has  been  tested  and  found  to  be  working 
correctly.  One  obvious  complication  that  this  introduces  is  that  the  nucleus 
durations  no  longer  agree  with  those  previously  expected  by  the  stress 
identification  subroutine,  so  that  some  errors  in  stress  location  may  result. 
This  requires  some  further  testing  and  possible  changes  in  the  location 
routine,  to  correctly  select  stressed  nuclei. 

Besides  these  revisions  in  syllabification,  we  also  made  minor  improvements 
in  the  stress  finding  routines,  removing  a few  minor  bugs  and  also  modifying 
the  ; ocedure  for  defining  the  position  of  peak  fundamental  frequency  (Fo) 
in  a phrase.  With  this  modification,  local  Fo  jumps  after  unvoiced  consonants 
do  not  cause  a displacement  in  the  peak-Fo  point  before  which  we  search  for 
the  first  stress  in  the  phrase. 

A preliminary  test  shoved  tLat  the  high-frequency  (65O  - 3000  Hz)  sonor- 
ant  energy  function,  when  used  in  place  of  the  regular  (6C  - 3000  Hz)  sonorant 
energy  function,  could  detect  some  syllable  boundaries  at  intervocalic  non- 
vowel sonorants,  but  it  also  was  less  smoo+h  than  the  regular  sonorant  energy 
function,  and  introduced  some  erroneous  syllable  boundaries.  Using  both 
functions  together,  or  using  some  other  spectrally-weighted  energy  function,  may 


18 


Report  No . PT  11 963 


UNIVAC 


permit  the  detection  of  more  syllabic  boundaries  without  introducing  erroneous 
boundaries. 

3-3  Intonations!  Phrase  Boundaries 

The  results  in  syntactic  boundary  detection  from  Fo  contours  are  summarized 
in  Table  I.  bach  subset  of  sentences  (listed  completely  in  Appendix  A)  is 
listed  separately,  with  a brief  explanation  of  the  structures  it  tests,  the 
number  of  expected  (syntactica.J.y-predicted)  boundaries  in  the  subset,  and  the 
numbers  of  correctly-detected  boundaries,  extra  (syntactically-unexpected,  but 
still  apparently  syntactically-related)  boundaries,  and  false  (phonetically- 
produced)  boundaries.  The  overall  figure  of  76%  of  all  expected  boundaries  being 
correctly  detected  compares  fairly  well  with  scores  ranging  from  seventy  percent 
to  ovor  ninety  percent  found  in  our  previouc  studies  (Lea,  1972,  1 97 3 c > 

1976c) . This  is  very  encouraging,  in  light  of  the  difficult  cases  for  boundary 
detection  included  in  this  database.  Many  of  the  sentences  are  all-sonorant 
to  avoid  false  boundaries  associated  with  Fo  changes  near  obstruents.  However, 
this  decreases  the  likelihood  of  sufficient  Fo  variations  at  expected  boundary 
positions.  Also,  many  sentences  are  V9~y  short  (e.g.,  in  subsets  1A,  IE,  2C^> 

3F,  6A,  7B,  and  other  scattered  sentences),  and  previous  experience  (as  well 
as  published  claims;  Armstrong  and  Ward,  1929)  predicts  that  such  sentences  will 
be  more  monotone  and  not  as  likely  to  exhibit  the  substantial  Fo  variations 
needed  for  boundary  detection.  We  could  lower  the  threshold  for  a substantial 
valley  to  4 eighth  tones  Fo  dip  and  rise  in  this  study,  precisely  because  the 
less  frequent  occurrence  of  obstruents  meant  less  likelihood  of  false  phonetically- 
produced  boundaries. 

One  can  use  the  boundary  detection  results  for  the  individual  sentences  and 
subsets  in  this  study  to  evaluate  our  basic  boundary  detection  hypothesis  and 
to  morn  firmly  establish  exactly  which  constituents  are  accompanied  by  Fo- 
detectjd  boundaries.  Our  basic  boundary  detection  hypothesis  can  be  stated 
as  follows: 

A substantial  (4  eighth-tone)  Fo  valley  will  occur  just  before  the 
fi.jt  stressed  syllable  of  each  major  syntactic  constituent  which 
is  preceded  by  another  constituent  containing  a stress.  Major 
syntactic  constituents  include:  main  verbs,  auxiliary  verbs  if  and  only 
they  contain  a stress,  such  as  a negative;  noun  phrases,  sentence 
adverbs;  conjuncts;  relative  clausee;  and  parentheticals.  In  addition, 
a boundary  will  occur  before  a preposition  if  it  is  stressed  (besides 
the  one  that  will  occur  before  the  strepsed  noun  phrase  within  the 
prepositional  phrase) . A boundary  vd.ll  occur  between  the  parts  of  a 
compound  noun. 


15 


Report  No.  PX  11 963 


UNIVAC 


1 


TABLE  I.  SUMttRI  OF  BOUNDARY  DETECTION  RESULTS 


SUBSET,  AND  STRUCTURES  STUDIED 

BOUNDARIES 

EJECTED 

N 

KITS 

* 

EITRA 
N * 

FALSE 

N * 

1A 

Strsss  M^renent, 

1 Stiess/conatltuent 

44 

21 

46* 

_ 

5 

19* 

IB 

Stress  MoT«nt>nt , 

Expanding  Deterainer 

26 

20 

77% 

3 

13? 

- 

- 

1C 

Stress  MoTtaent, 

NP  Modifiers,  <4  S/C 

50 

U6 

92% 

6 

11* 

3 

6* 

ID 

Stress  Moeenent, 

NP  Modifiers,  >1*  S/C 

54 

Uo 

91% 

25 

3?/ 

7 

■j 

4* 

IE 

"Fly Inf;  Planes”  Ptradipc  ' 

12 

u 

100% 

— 

— 

— 

— 

‘"2 

Strese  MoveLent 

In  First  Constituent 

30 

19 

63% 

2 

9% 

3 

13* 

30 

Verb/lfoun  Stress  Pairs 

5£ 

56 

97% 

15 

18* 

24 

28* 

3F 

Phonetic  Influences 

28 

18 

64* 

3 

13* 

3 

13* 

4c 

KP-PP-PP  Subordination 

25 

19 

76* 

2 

9*  ' 

1 

5* 

6a 

C-nmands 

33 

27 

82* 

1 

4* 

— 

— 

7B 

lea/No  tfuestlons 

16 

10 

56* 

14 

54* 

2 

8* 

7D 

VR  Questions 

46 

33 

72* 

2 

6* 

— 

— 

8a 

Coordinate  Sentences 

ill 

71 

64* 

h 

5* 

— 

2* 

8h 

Coordinate  Verb  Phrases 

55 

1*6 

m 

4 

8* 

1 

2? 

8r. 

Coordinate  Noun  Phrases 

36 

33 

82* 

9 

20* 

3 

7* 

11A 

Relatlee  Clauses 

_£P. 

-3S_ 

_65* 

.KL 

Ifi* 

_e_ 

.141 

TOTALS  FOR  ALL  SUBSETS 

686 

519 

100 

57 

overall  percentages 

76* 

15* 

81 

20 


Report  No.  PX  11963 


UNIVAC 


Let  us  consider  what  each  of  the  subsets  teaches  us  about  the  adequacies 
and  inadequacies  of  this  hypothesis.  Subset  1A,  for  example,  has  very  simple  short 
sentences  of  the  forms  (NP  V),  (IIP  V NP),  and  (NP  AUX  V NP) . We  expect  a 
boundary  before  the  main  verb  (V)  and  before  the  second  HP.  In  fact,  only  3 
of  the  24  main  verbs  in  subset  1A  were  preceded  by  boundaries.  Five  of  the 
eight  were  before  the  word  "worry",  in  fact  all  of  the  cases  where  the  verb 
was  "worry"  were  preceded  by  boundaries,  while  the  verbs  "know  ",  "knew",  "owe", 
and  "enroll"  were  not  preceded  by  boundaries,  unless  there  was  an  (unstressed) 
auxiliary  preceding  those  verbs.  Is  there  something  about  the  phonetic  structure 
(prestressed  /w / versus  other  sonorants)  that  causes  this  apparent  lexical 
difference?  Or,  is  the  difference  due  to  the  different  stress  patterns  of  the 
verbs  (SU  versus  either  S or  US)?  Certainly  the  presence  of  boundaries  before  a 
main  verb  when  the  verb  is  preceded  by  an  unstressed  auxiliary’  suggests  that 
the  alternating  stress  pattern  SUS  is  more  likely  to  have  a boundary  before  the 
second  S than  the  patuern  SS.  This  is  to  be  expected,  since  Fo  is  usually 
lower  in  unstressed  syllables  than  in  stressed  ones,  so  the  Fo  valley  should 
more  readily  occur  when  an  unstressed  syllable  intervenes  between  stresses. 

This  explains  the  three  cases  where  a main  verb  (other  than  "worry")  preceded 
by  an  auxiliary  was  accompanied  by  a boundary.  Still,  we  would  expect, 
according  to  the  basic  boundary-detection  hypothesis,  that  a boundary  should 
occur  in  the  SS  sequence,  between  the  stressed  noun  subject  of  the  sentence 
and  stressed  verb.  We  have  previously  tried  (Lea,  1973b)  to  accept  the  counter- 
hypothesis that  no  boundary  will  occur  between  a noun  and  a following  verb,  but 
the  five  cases  of  boundaries  with  "worry"  discount  that  counterhypothesis. 

Subset  1 A also  showed  that  12  of  the  18  expected  boundaries  between  main 
verbs  and  following  object  NP's  were  detected.  All  those  that  were  misseu  bad 
SS  sequences  at  the  verb-noun  boundary,  suggesting  again  that  boundaries  are 
less  reliably  detected  within  SS  sequences.  However,  there  were  four  SS  sequences 
at  the  V-NP  boundary  that  were  accompanied  by  detected  boundaries.  We  can  say 
that  if  the  basic  boundary  detection  hypothesis  predicts  a boundary  and  the 
phrase  boundary  is  spanned  by  an  SUS  sequence,  the  boundary  will  be  detected, 
if  the  boundary  is  immediately  preceded  and  followed  by  stresses  it  may  or  may 
not  be  detected. 

These  uncertainties  suggest  that  more  data  needs  to  be  examined,  to  deter- 
mine what  causes  some  but  not  all  NP-V  and  V-NP  boundaries  to  be  detected. 

Table  II  shows  the  results  of  analyzing  all  NP-V  and  V-NP  boundaries  in  the 
255  sentences,  excluding  those  cases  where  other  factors  like  coordinate 
NP's,  relatives  in  the  NP,  or  other  structural  issues  might  interfere.  (Note 


21 


Report  No.  FI  11 963 


UNIVAC 


TABLE  II.  CORRECT  DETECTIONS  OF  NP-V  AND  V-NP  BOUNDARIES 
IN  THE  255  SENTENCES 


SUBSET  NUMBER  AND 

NP-V 

BOUNDARIES 

V-NP  BOUNDARIES 

STRUCTURES  STUDIED 

ss 

SEQUENCES 

Sift 

SEQUENCES 

SS 

SEQUENCES 

SUS 

SEQUENCES 

1A 

Stress  Movement,  1S/C 

4/15-27* 

4/9-44* 

3/9=33* 

8/8-100* 

IB 

Stress  Movement 

Expanding  Determiner 

— 

10/33-77* 

9/11=82* 

1/2-50* 

1C 

Stress  Movement 

NP  Modifiers,  --4S/C 

— 

22/25=88* 

13/14-93* 

11/11=100* 

ID 

Stress  Movement, 

HP  Modifiers,  >4S/C 

... 

23/27-85* 

19/20-95* 

7/7-100* 

IE 

•Flying-Planes" 

Paradigm 

— 

4/4=100* 

— 

8/8-100* 

2C2 

Stress  Movement  in 
1st  Const. 

1/4-25* 

9/11=82* 

2/6=33* 

7/9=78* 

3D 

Verb/Noun  Stress  Pairs 

— 

13/14=93* 

2/2=100* 

9/9=100* 

3r 

Phonetic  Influences 

2/5-40* 

3/4-75* 

10/12=83* 

... 

4C 

NP-PP-FP  Subordination 

... 

2/5=40* 

5 /5= 100* 

0/2=0* 

6 A 

Commands 

.... 

... 

5/8-63* 

7/7=100* 

7P 

Yes/No  Questions 

5/5-100* 

1/5-20* 

3/6-50* 

1/1=100* 

7D 

WH  Questions 

4/8-50* 

3/7=42* 

4/7=57* 

4/4=100* 

8A 

Coordinate  Sentences 

1/16-6* 

14  24  = 58* 

22/32=69* 

6/8=75* 

TOTALS  FOR  ABOVE  SUBSETS 

17/53-32* 

108/148-73* 

97/122=79* 

69/76=91* 

22 


Report  No.  PX  11 963 


UNIVAC 


that  PU*S  means  cases  where  one  or  more  unstresses  occur  between  the  last 
stress  of  the  previous  pnrase  and  the  first  stress  of  the  next  phrase.)  The 
totals  for  all  subsets,  and  the  results  for  the  individual  subsets,  show  that 
both  the  NP-V  and  V-NP  boundaries  are  less  likely  to  be  evident  when  two  stresses 
are  adjacent,  while  intervening  unstressed  syllables  aid  the  Fo-marking  of 
boundaries.  Yot,  this  effect  is  much  more  pronounced  for  NP-V  boundaries  than 
for  V-NP  boundaries.  In  this  sense,  the  V-NP  boundary  is  a much  more  stable, 
reliably  detected  boundary  than  is  the  NP-V  boundary.  Contrary  to  the  assumption 
in  many  published  works  (Lieberman,  1967>  Schole3,  1969>  Oiler,  1 973 » ) » the  subject- 
predicate  (or  NP-V)  boundary  is  not  the  most  robust  or  prominent  boundary  in 
acoustic  data  (cf.  Lea,  1972,  1 973G) • 

We  may  very  well  be  disappointed  thet  the  acoustic  detection  of  phrase 
boundaries  cannot  be  simply  explained  in  purely  syntactic  terms,  without  the 
need  for  disclaimers  or  qualifying  phrases  related  to  lexical  choices,  stress 
patterns,  or  phonetic  structures.  But,  until  wc  have  an  adequate  model  of  all 
influences  on  Fo  contours,  we  apparently  must  acknowledge  such  loose  generalities 
as  the  notion  that  boundaries  are  "more  likely  to  be  detected"  when  unstresses 
intervene  between  the  last  stress  of  one  phrase  and  the  first  stress  of  the 
next  phrase. 

There  are  a number  of  other  interesting  results  embedded  in  the  figures  of 
Table  II.  For  example,  the  rightmost  column  of  the  table  shows  that  excellent 
performance  in  boundary  detection  was  obtained  with  V-NP  boundaries  accompanied 
by  SU*S  sequences.  Two  of  the  seven  missed  V-NP  boundaries  with  SU*S  sequences 
were  accompanied  by  borderline  Fo  variations  of  3 eighth  tones.  Four  others 
involved  words  (repeated  nouns  and  the  command  verb  "put")  that  were  predicted 
to  be  stressed,  but  the  listeners  did  not  actually  hear  as  stressed,  so  no 
boundary  should  be  expected.  Thus,  there  are  actually  only  3 out  of  72  expected 
V-NP  boundaries  with  SU*S  sequences  that  were  not  detected.  Some  of  the  other 
lower  scores  in  the  table  result  from  cases  where  predicted  stresses  were  not 
actually  perceived  as  stressed.  For  example,  lower  scores  with  coordinate 
sentences  (subset  8a)  resulted  from  predicting  that  the  repeated  verbs  and  nouns 
would  be  stressed,  when  in  iact  repeated  words  were  usually  not  stressed.  The 
possessive  pronoun  "nine"  was  predicted  to  be  stressed  for  the  commands  in  sub- 
set 6A,  but  several  boundaries  were  not  detected  before  those  pronouns  since 
mine"  was  not  actually  stressed  as  expected.  A glance  at  the  stress  patterns  and  boundary 


23 


Report  No.  PI  11 963 


UliTVAC 


results  shown  in  Appendix  A for  each  individual  sentence  can  assure  one  that  a 
significant  fraction  of  the  "missing'1  boundaries  should  have  been  missing  since 
the  surrounding  words  were  not  stressed  as  expected. 

Boundary  detections  for  other  types  of  phraBe  structures  ere  given  in  Table 
III.  Clearly,  most  other  types  of  expected  boundaries  are  very  reliably  detected. 

The  only  V-PP  boundary  missed  was  in  a relative  clause,  which  had  reduced  stresses 
and  a fairly  flat  Fo  contour.  Over  half  of  the  AUI-V  boundaries  missed  were  in 
coordinate  structures  where  the  verb  was  not  perceived  as  streBBed,  and  thuB 
these  structures  would  not  be  expected  to  be  accompanied  by  boundaries.  All  of 
the  missing  NP-PP  boundaries  were  before  a short  utterance-final  PP  ("from  Maine", 

"of  May")  with  weak  stress  on  the  noun,  and  consequently,  fairly  flat  Fo  contours. 

The  two  missing  NP-KP  boundaries  also  involved  low  streBBes  due  to  occurrence  in 
a coordinate  structure  or  a relative  clause.  The  two  misBing  NP-ADV  boundaries 
were  before  moderate  utterance-final  stresses,  but  exhibited  3-eighth-tone  Fo 
valleys  (almost  sufficient  to  be  detected. as  boundaries).  Both  of  the  cases  where 
a relative  pronoun  was  not  followed  by  a boundary  had  an  unexpected  boundary  before 
the  relative,  and  small  Fo  valleys  after  the  relative  pronoun.  Two  boundaries 
before  stressed  auxiliaries  (involving  negatives)  also  were  not  fom^,  apparently 
due  to  reduced  stresses  in  coordinate  constructions.  Clearly,  one  of  the  primary 
causes  for  missing  some  expected  boundaries  iB  the  reduced  stresses  that  accompany 
subordination,  relative  clauses,  and  repetition  in  coordinate  constructions. 
Comprehensive  stress  ruleB  could  predict  the  abBenceB  of  BtresBes  in  Buch  structures, 
and  no  boundaries  would  be  expected  in  such  circumstances. 

Table  IV  summarizes  where  extra,  unexpected  boundaries  occurred  in  the  255 
sentences.  Several  regularities  suggest  the  value  of  modifying  the  current  phrase 
boundary  location  hypothesis  (page  26  ) . In  particular,  there  seems  to  be  a regular 
occurrence  of  a boundary  before  the  noun  (or,  if  the  noun  is  an  unstressed  pronoun, 
before  the  verb)  in  a yes/no  question,  even  though  the  initial  auxiliary  verb  is 
perceived  as  unstressed  or  reduced.  Perhaps  such  boundaries  should  be  expected 
Similarly,  while  our  basic  hypothesis  that  there  is  no  boundary  internal  to  an 
NP  (except  for  compounds)  Beems  to  hold  in  the  majority  of  cases,  the  fact  that 
tv'o  thirds  of  the  ADV-ADJ  boundaries  are  accompanied  by  Fo  valleys  suggests  that 
perhaps  those  boundaries  should  be  expected.  Finally,  we  might  revise  the  hypo- 
thesis to  predict  a detected  boundary  between  a NP  and  its  following  relative 
pronoun.  This  seems  particularly  strange,  since  the  relative  pronoun  is  1 'ard 
as  unstressed. 


24 


Report  No.  PX  11 963 


UNIVAC 


table 

III. 

CORRECT  DETECTIONS 

of  various 

TYPES  OF  EXPECTED 

BOUNDARIES 

PHRASE  STRUCTURE 

TYPE 

BOUNDARIES 

HITS 

PERCENTAGE 

EXPECTED 

CORRECTLY 

DETECTED 

Boundaries  At  Verbals: 


V-ADV 

2 

2 

100% 

V-PP 

10 

9 

90% 

AUX-V 

120 

96 

80% 

Boundaries  At  NP's: 

N-N  Compounds 

7 

7 

100% 

Preposed  ADV-NP 

2 

2 

100% 

NP-PP 

32 

28 

88% 

NP-NP 

17 

15 

88% 

NP-ADV 

13 

11 

85% 

Relative  Pronoun-NP 

10 

8 

80% 

NP- (AUX+NEG) 

7 

5 

71% 

Conioined  Structures: 

(VP  Conj  )-VP 

12 

12 

100% 

NP,-(NP,Conj  NP) 

5 

5 

100% 

VP, - (VP,  Conj  VP) 

3 

3 

100% 

(S  Conj)  - S 

22 

21 

95% 

(NP  Conj ) - NP 

18 

17 

94% 

25 


Report  No.  PI  11 963 


UNIVAC 


TABLE  IV.  UNEXPECTED  (EXTRA)  BOUNDARY  DETECTIONS 


PHRASE  STRUCTURE 
TYPE 


OCCURRENCES 
OF  THE 
STRUCTURE 


NUMBER  OF 

BOUNDARIES 

DETECTED 


PERCENTAGE  OF 
OCCURRENCES  THAT 
WERE  DETECTED 


Boundaries  At  Verbals: 

( AUX+PRONOUN ) -V 

100% 
75% 
3% 


in  Y/N?  3 
AUX-NP  in  Y/N?  8 
NP-AUX  125 


3 
6 

4 


Boundaries  Within  NP's; 


ADV-ADJ 

12 

8 

ADJ-N 

69 

22 

QUANT- ADJ 

19 

6 

QUANT- N 

12 

3 

ADJ -ADJ 

27 

6 

■ Boundaries: 

NP-Relative 

Pronoun 

17 

15 

Within  Last 
Word  of  Y/N? 

14 

5 

67% 

32% 

32% 

25% 

22% 


88% 

36% 


26 


Report  No.  PX  11963 


UNIVAC 


Interestingly,  eighteen  (i.e.,  4c$)  of  the  unexpected  boundary  detections  in 
NP's  involve  the  words  "moral"  or  11  immoral" , even  though  those  words  occur  in 
only  20  ( 22 %)  of  the  multiple-stress  NP's.  Over  80$6  of  the  unexpected  boundaries 
within  NP's  involved  multisyllabic  words  (and  thus  alternating  stressed-unstressed 
patterns),  suggesting  (as  we  found  with  the  correctly  detected  NP-V  and  V-NP 
boundaries  on  pages  21  to  23)  that  unstresses  between  stresses  increase  the 
chance  of  boundaries  being  detected. 

Also  listed  in  Table  IV  are  five  cases  where  extraneous  boundaries  were  found 
within  the  last  word  of  a yes/no  question.  This  was  due  to  an  Fo  valley  appearing 
just  before  the  terminal  rise  in  Fo  that  often  accompanies  yes/no  questions. 

One  other  category  of  boundaries  that  needs  mention  are  the  "false"  boundaries 
listed  in  the  rightmost  column  of  Table  I (page20).  Twenty  false  boundaries  ( 35%) 
were  in  the  initial  syllable  of  an  utterance,  resulting  from  local  Fo  variations 
at  voice  onset  (perhaps  due  to  glottal  stops  or  such) . These  could  be  eliminated 
by  setting  a minimum  time  between  the  onset  of  voicing  in  an  utterance  and  the 
time  of  the  first  possible  phrase  boundary.  Another  thirty  {%%)  of  the  false 
boundary  detections  resulted  from  Fo  variations  associated  with  non-initial  obstruent 
consonants.  Other  false  boundary  detections  resulted  from  errors  in  the  Fo 
contour,  and  phrase-final  terminal  rises  in  Fo.  A graphic  illustration  of  how 
false  boundaries  are  introduced  by  presence  of  obstruents  is  that  twenty  four 
(42)0  of  the  false  boundaries  were  in  subset  3D,  which  has  many  obstruents  in  it. 

In  summary,  it  appears  we  are  very  near  optimal  attainable  performance  in 
phrase  boundary  detection  from  Fo  contours,  with  few  possibilities  of  improve- 
ment by  revisions  in  the  computer  program.  For  some  ideas  about  further  detailed 
improvements  in  the  BOUND  3 phrase  boundary  detection  program,  see  our  previous 
semiannual  report  (Lea,  1976c).  However,  results  with  the  255  sentences  suggest 
that  the  places  where  boundaries  should  be  predicted  could  deserve  further  study. 

We  have  found  that  expected  stress  patterns  should  be  taken  into  account,  so  that 
boundaries  will  not  be  expected  in  coordinate  structures  with  repeated  words,  or 
in  some  subordinate  structures.  Even  the  presence  or  absence  of  unstresses  between 
stressed  syllables  could  be  used  to  refine  the  probability  of  detecting  a phrase 
boundary.  On  the  other  hand,  boundaries  should  perhaps  be  expected  at  the  first 
stress  after  the  auxiliary  verb  in  a yes/no  question,  between  a noun  phrase  and 
the  relative  pronoun  of  its  subordinate  relative  clause,  and  perhaps  even  between 
an  adverb  and  the  adjective  it  modifies  within  a noun  phrase.  The  minimal  contrasts 


27 


Report  No.  PI  119&3 


UNIVAC 


in  structure  within  pairs  of  the  database  sentences  have  been  useful  in  highlighting 
these  refinements.  Further  work,  with  other  talkers  and  more  structures, 
should  be  done. 

3-U  Syllabification  and  Automatic  Location  of  Stress 

Syllables  are  located  in  the  spoken  sentences  by  detecting  substantial 
(UdB)  dips  in  sonorant  (60-3000  Hz)  energy  which  occur  in  the  consonantal  region 
of  syllable  boundaries.  The  syllabic  nucleus  (vowel  and  some  adjacent  sonorant 
consonants)  is  centered  around  the  local  peak  in  sonorant  energy,  with  beginning 
and  ending  points  of  the  nucleus  located  at  those  points,  closest  to  the  preceding 
and  following  dips,  whose  energy  is  at  least  half  of  the  distance  from  the 
value  in  the  dip  to  the  value  at  the  syllabic  peak. 

The  255  sentences  provide  very  demanding  tests  of  the  syllabification  pro- 
cedure, since  many  sentences  are  all-sonorant.  Sonorant  consonants  do  not  produce 
large  energy  dips  such  as  obstruents  do.  A large  majority  of  the  failures  to 
locate  syllables  in  previous  studies  have  been  due  to  intervocalie  sonorants  not 
providing  the  necessary  dips  in  energy,  so  two  or  more  syllables  in  an  all- 
sonorant  sequence  appear  as  one  nucleus.  (Most  of  these  long  combinations  of 
syllables  then  appear  as  single  stressed  syllables.) 

Table  V shows  the  syllabification  results  for  each  subset  of  the  255  data- 
base sentences  that  have  been  studied.  The  overall  result  of  91/6  correct  detection 
of  expected  syllables  is  very  satisfying,  particularly  for  this  difficult  data. 

Also,  the  predictions  of  expected  syllables  were  biased  against  the  syllabification 
procedure,  in  that  words  like  "tower",  "moral",  and  "eyeing"  were  predicted  to 
be  two  syllables  even  though  one  could  anticipate  that  the  two  syllables  could 
merge  into  one  in  actual  speech. 

Fifty  one  (29%)  of  the  175  syllables  that  were  not  automatically  detected 
were  the  weak  syllables  in  five  words  (moral,  immoral,  Mary,  marry,  ruin)  that 
were  always  detected  as  having  less  syllables  than  expected.  Other  less  fre- 
quent words  like  "worry",  "Armenian",  "tower",  "Murray",  "Marion",  "aluminum" , 
"erring",  etc.,  were  also  consistently  found  with  fewer  syllables  than  expected, 
and  accounted  for  over  25  (14%)  of  the  missing  syllables.  Other  words  that  often, 
though  not  always,  were  found  with  fewer  syllables  than  expected  included  'really" 
"marine",  "airmen",  "enroll",  etc. 


Report  No.  PX  11 963 


UNIVAC 


TABLE  V.  SYLLABIFICATION  RESULTS 


SUBSET  NUMBER,  AND  SYLLABLES  HITS  FaLSE 


STRUCTURES  STUDIES 

EXPECTED 

N 

% 

% 

1A 

Stress  Movement,  1S/C 

102 

87 

85% 

2 

2% 

IB 

Stress  Movement, 
Expanding  Determiner 

91 

84 

92% 

2 

2% 

1C 

Stress  Movement 
NP  Modifiers,  4S/C 

192 

176 

92% 

0 

0% 

ID 

Stress  Movement, 

NP  modifiers,  4S/C 

256 

217 

85% 

1 

0.5% 

IE 

"Flying-Planes" 

Paradigm 

50 

43 

86% 

0 

0% 

2C2 

Stress  Movement  in 
1st  Const. 

106 

83 

78% 

1 

1% 

3D 

Verb/Noun  Stress  Pairs 

236 

222 

94% 

4 

2% 

3F 

Phonetic  Influences 

54 

54 

100% 

3 

5% 

4C 

NP-PP-PP  Subordination 

B6 

85 

99% 

1 

1% 

6 A 

Commands 

99 

86 

87% 

0 

0% 

7B 

Yes/No  Questions 

73 

68 

93% 

1 

1% 

7D 

WH  Questions 

99 

97 

98% 

0 

0% 

BA 

Coordinate  Sentences 

189 

181 

96% 

1 

1% 

BH 

Coordinate  Verb  Phrases 

115 

100 

87% 

1 

1% 

BK 

Coordinate  Noun  Phrases 

116 

110 

95% 

2 

2% 

11A 

Relative  Clauses 

120 

116 

97% 

4 

3% 

TOTALS  FOR  ALL  SUBSETS 

1984 

1809 

23 

OVERALL  PERCENTAGES  91%  1% 


29 


Report  No.  PX  11 963 


UNIVAC 


Some  persistent  tendencies  to  lose  syllables  by  grouping  tvo  syllables 
ecross  word  boundaries  were  also  found.  Twenty  three  (13?)  of  the  missing 
syllables  were  a result  of  the  first  two  syllables  of  "will  enroll1 11  appearing 
to  be  one  syllabic;  "know  a(n)"  gave  five  other  misses,  while  "enroll  a(n)" 
gave  seven  more. 

Table  V shows  that  all-sonorant  subsets  1A,  ID,  IE,  2C oA,  and  8H  are 
the  only  ones  with  syllabification  scores  under  90?,  while  subsets  with  obstruents, 
like  3D,  3F,  and  4C  give  above-average  success  in  syllable  detection. 

Also  of  interest,  are  the  23  false  alarms  in  syllable  location,  shown  in  the 
rightmost  column  of  Table  V.  Nine  of  these  appear  to  be  due  to  a bug  in  the  pro- 
gram which  is  currently  allowing  short  (20  or  30  ms)  chunks  of  high  energy  to  be 
called  syllabic  nuclei.^  There  is  a test  in  the  CHUNK  program  that  should  be 
throwing  out  all  syllable  candidates  whose  nuclei  are  less  than  40ms  in  duration 
Related  to  these  are  four  other  cases  where  erroneous  utterance-final  or  phrase- 
final  syllables  are  inserted  due  to  our  smoothing  of  the  energy  function,  which 
improperly  brings  some  energy  levels  within  noise  up  to  the  high  energies  needed 
for  syllable  detection.  Five  other  low-energy  erroneous  syllable  detections 
could  be  eliminated  by  setting  a threshold  of  minimum  energy  below  which  no  syllabic 
could  be  detected.  Three  cases  occurred  where  it  appears  the  talker  actually 
said  an  extra  syllable,  like  "nine-uh"  for  "nine". 

While  the  syllabification  results  are  very  satisfying,  there  is  room  for  some 
future  improvements.  One  promising  idea  is  to  modify  the  spectral  weighting  of 
the  energy  function  so  it  dips  more  reliably  and  substantially  in  non-vowel 
sonorants  at  syllable  boundaries.  In  some  preliminary  studies  with  other  data, 
we  found  that  a high-frequency  (650-3000Hz)  sonorant  energy  function  dipped  more 
in  intervocalic  sonorants  than  did  the  regular  sonorant  energy  function,  but  it 
was  noisier  or  more  prone  to  introduce  false  boundaries  within  syllables . The 
two  together,  or  some  spectrally-weighted  "loudness  function"  that  dips  more 
in  intervocalic  sonorants,  would  seem  to  be  needed. 

Once  syllables  are  located,  we  can  then  determine  which  ones  are  stressed. 

Our  STRESS  program  associates  stresses  with  high-energy  syllabic  nuclei  near  the 
Fo  rise  at  the  beginning  of  an  Fo-detected  phrase,  and  at  local  inflections  in  Fo  at 
later  points  in  the  phrases.  The  results  of  applying  this  program  to  the  255  sen- 
tences are  shown  in  Table  71.  The  overall  score  of  92?  correct  locations  of  stressed 

1 As  this  report  went  to  press,  this  bug  had  been  corrected,  so  that  only  14  false 

alarms  remain. 


30 


Report  No.  PX  11 963 


UNIVAC 


TABLE  VI.  STRESS  LOCATION  RESULTS 


SUBSET  NUMB").  AND 
STRUCTURES  STUDIES 

STRESSES 

N 

HITS 

* 

N 

FALSE 

* 

1A 

Stress  Movement,  1S/C 

60 

55 

°2* 

7 

11* 

IB 

Stress  Movement, 
Expanding  Determiner 

51 

49 

96* 

8 

14* 

1C 

Stress  Movement 
’IP  Modifiers,  4S/C 

112 

97 

87* 

18 

16* 

ID 

Stress  Movement, 

NP  modifiers,  4S/C 

138 

120 

87* 

25 

17* 

IE 

■’Flying-Planes" 

Paradigm 

24 

24 

100? 

9 

27* 

2C2 

StreBs  Movement  in 
1st  Const. 

44 

42 

9** 

8 

16* 

3D 

Verb/Noun  Stress  Pairs 

86 

78 

91* 

34 

30* 

JF 

Phonetic  Influences 

44 

42 

95* 

3 

7* 

4c 

NF-PP-PP  Subordiantion 

37 

30 

81* 

e 

21* 

6a 

Commands 

57 

56 

98* 

8 

13* 

7B 

Yes/Ho  Quest.. ns 

38 

33 

87* 

17 

34* 

7D 

VH  Questions 

64 

61 

95* 

6 

9* 

8a 

Coordinate  Sentences 

93 

88 

95* 

34 

28* 

8H 

Coordinate  Verb  Plirnses 

57 

53 

93* 

15 

22* 

8k 

Coordinate  Noun  Phrases 

54 

50 

93* 

15 

23* 

1 1 A 

Relative  Clauses 

74 

69 

9 <* 

21 

TOTALS  FOR  ALL  SUBSETS 
0VERAI.L  PERCENTAGES 

1033 

947 

92* 

228 

&*!£ 

20* 

31 


nwmwuiM 


Report  No.  PX  11 963 


UNIVAC 


syllables  is  surprisingly  good,  and  is  comparable  to  our  best  results  on  previous 
studies  with  read  speech.  Equally  satisfying  is  the  fairly  low  percentage  of 
stress  locations  that  were  false  (i.e.,  located  syllables  that  were  not  perceived 
of  five  listeners).  In  past  studies  (Lea,  1974a,  p.  1 9)  > we  have  found  com- 
parable figures  for  such  false  alarms,  but  the  new  definition  of  nucleus  durations 
extending  out  to  the  half -way-down  points,  with  resulting  longer  nucleus  durations, 
was  expected  to  increase  the  likelihood  of  false  stress  locations. 

Examination  of  the  detailed  stress  patterns  in  the  various  database  sub- 
sets shows  a few  specific  causes  for  many  missed  stresses  and  false  locations. 

For  example,  only  subsets  1C,  ID,  4C,  and  7B  had  less  than  90$  of  all  perceived 
stresses  correctly  found.  In  subsets  1C  and  ID,  10  of  the  33  streBBed  syllables 
that  were  not  located  were  in  utterance  initial  position.  An  error  in  the 
STRESS  program  occasionally  caused  such  initial  stresses  to  be  missed,  and 
stress  to  be  placed  on  the  second  syllable  even  though  it  had  less  Fo  rise  and 
nucleus  duration.  Twelve  (30$)  of  the  40  missed  stresses  in  subsets  1C,  ID, 
and  4C  were  for  the  word  "young",  while  another  ten  (25$)  were  for  the  second 
syllable  on  "enroll". 

Sixty  nine  (29$)  of  all  the  238  falae  stresses  located  in  the  full  set  of 
255  sentences  were  cases  of  the  auxiliary  verb  "will"  (or,  where  the  two 
syllables  are  erroneously  detected  as  one,  then  the  sequence  "will  en-")  being 
called  stressed.  All  but  one  of  the  eight  cases  of  the  auxiliary  verb  "are" 
in  subset  IE  were  falsely  located  as  stressed.  Subset  3D  stands  out  as  having 
many  false  stress  locations.  This  is  the  one  subset  where  voiced  and  unvoiced 
obstruents  ere  very  frequent,  and  it  is  the  presence  of  sv.ch  obstruents  that 
appears  to  cause  all  or  almost  all  of  the  false  stresses  in  that  subset.  Other 
prominent  sources  of  false  stresBeB  included:  repeated  words  in  coordinate 

constructions  (36,  or  about  56$  of  the  false  alarms  in  subsets  8A,  8H,  and  8K); 
conjunctions  (12,  or  19$  of  the  false  alarms  in  subsets  8A,  83  and  8K),  and 
relative  pronouns  (8,  or  35$  of  the  false  alarms  in  subset  11 A).  While  many 
of  these  words  were  not  perceived  as  stressed,  they  showed  many  of  the  prosodic 
correlates  of  stressed  syllables.  Perhaps  these  are  instances  of  the  listeners 
hearing  a Btress  level  which  is  dictated  more  by  expectations  determined  by  the 
sentence  structure  than  by  acoustic  info:mation.  One  other  possible  cause  of 


32 


Report  No.  PX  i 1 963 


UNIVAC 


27  false  stresses  involves  utterance-final  or  prepausal  syllables.  There  is 
a special  test  in  the  STRESS  program  that  locates  prepausal  stresses  if  the 
prepausal  syllables  are  of  sufficient  duration.  Since  the  durations  of  nuclei 
are  now  usually  defined  longer  than  before  (see  pages  16  to  18),  unstressed  pre- 
pausal syllables  are  more  prone  to  be  erroneously  detected  as  stresses. 

In  sun  *v,  the  stress  location  results  were  very  good,  though  there  is 
room  for  imp  iient.  This  is  evident  when  one  considers  .that  18$  of  the  1809 
syllables  detected  by  the  syllabification  routine  were  either  perceived  as 
stressed  and  not  located  a_  stressed,  or  perceived  as  non-stressed  and  located 
as  stressed;  that  is,  18$  of  the  syllables  were  confused  bt tween  perceived  and 
automatically  detected  sti  sc  levels.  This  is  considerably  more  than  the 
3 to  6%  confusions  in  perceived  stress  levels  from  trial  to  trial  or  from 
listener  to  listener  (Lea,  1976c,  pp.  27-29).  An  ideal  stress  location  algorithm 
might  be  -.xpected  to  exhibit  around  confusions  when  compared  with  perceived 
stresses.  Part  of  that  remaining  1 356  or  so  might  be  eliminated  by  better 
syllabification  results  and  by  correcting  the  current  errors  that  miss  utterance- 
initial  stresses  and  introduce  erroneous  utterance-final  (prepausal)  stresses. 
Other  improvements  coild  come  from  adjustments  of  syllabic  durations,  Fo  values, 
and  energies  on  the  basis  of  the  vowel  identity  (or  formant  Fo  values;  see  Lea, 
1976c,  p.  19)  and  phonetic  context.  Perhaps  totally  different  ways  of  combining 
Fo,  energy,  and  duration  cues  could  improve  stress  location  scores,  though  other 
published  algorithms  have  not  performed  better  (Sargent,  1 975 > Cheung,  1975). 

3-?  Testing  Into: ational  Hypotheses 

The  success  of  the  archetype  algorithm  for  stressed  syllable  location 
shows  lie  following  major  features  of  the  intonation  of  English  sentences: 

• Fo  rises  substantially  at  the  first  stress  in  each  mrlor  phrase; 

• Fo  falls  gradually  after  peaking  near  or  somewhat  after  the 
first  stress  in  the  phrase; 

• Fo  rises  slightly  at  all  other  stresses  in  the  phrases  (but 
usually  not  so  much  as  to  yield  a fall-rise  Fo  valley  within 
the  phrase). 

We  also  noted  the  following  complicating  effects  of  phonetic  structure  on 
on  Fo  contours: 


33 


Report  No.  PI  11 963 


UNIVAC 


• Fo  dips  during  voiced  obstruents,  and 

• Fo  is  high  immediately  after  unvoiced  obstruents,  and  then 
i . -dly  drops  to  values  dictated  by  the  BtreBS  and  other 
large-unit  intonational  effects. 

Fundamental  frequency  contours  in  the  255  sentences  alBo  Bhov  several 
other  types  of  intonational  regularities  besides  Fo  valleys  at  phrase 
boundaries  and  increases  of  Fo  at  stresses.  For  example,  we  noted  in  the 
previous  section  that  99%  of  the  initial  stresses  in  sentences  were  coincident 
with  (or  immediately  before)  the  Fo  peak  in  the  sentence.  Turning  this 
around,  we  may  note  more  generally  that 

• Fo  rises  steadily  in  the  initial  part  of  sentence,  until 
the  first  stress,  where  it  peaks. 

This  regularity  was  noted  at  least  as  early  as  1929  (Armstrong  and  Ward), 
but  is  sometimes  obscured  in  arbitrary  sentences  by  effects  Buch  as  phonetic 
influences  on  Fo  causing  fall-rise  valleys  before  the  first  stress,  or 
early  brief  jumps  of  Fo.  after  unvoiced  obstruents,  to  values  just  higher 
than  the  values  in  the  initial  stress.  Sometimes  a syllable  after  the  first 
stress  can  have  a brief  Fo  peak  above  that  in  the  first  BtreBS,  due  to  an 
unvoiced  obstruent.  However,  in  the  255  sentences  analyzed,  Buch  effects 
did  not  mask  thiB  regularity.  The  all-sonorant  sentences  obviously  had  no 
such  complications,  and  I found  that  the  phonetic  effects  in  the  sentences 
with  obstruents  could  be  eliminated  by  disallowing  the  first  two  Fo  points 
after  a period  of  unvoicing  from  defining  the  peak,  and  simply  stating  that 
the  first  stress  is  the  syllable  whose  syllabic  peak  immediately  precedes  the 
Fo  peak.  In  a couple  of  borderline  cases  of  stress,  where  Borne  of  the 
lister erB  heard  the  utterance  initial  syllable  as  stressed,  and  the  computer 
program  located  the  syllable  as  stressed,  the  Fo  peak  appeared  there  even 
though  the  majority  of  listeners  didn't  perceive  the  syllable  as  stressed. 
These  Beem  so  much  like  initial  stresses  that  they  shouldn't  be  taken  as 
refuting  the  regularity  of  coincidence  o.‘  Fo  peak  and  initial  stress.  Also, 
there  were  two  cases  where  the  Fo  peak  did  not  align  with  the  first  stress 
because  an  emphasized  syllable  later  in  a sentence  caused  an  unusus.ll}  high 
Fo  in  its  region,  which  exceeded  the  Fo  peak  on  the  initial  stress.  However, 
in  both  cases,  this  later  Fo  peak  might  be  ruled  out  either  by  setting  a 
maximum  time  (after  voicing  onset)  before  the  first  stress  must  be  encountered 


Report  No.  PX  11 963 


UNIVAC 


or  else  by  noting  the  presence  of  a large  Fo  valley  before  that  delayed  Fo 
peak.  This  effect  may  be  stated  specifically  as  an  intonational  hypothesis: 

• An  emphasized  or  contrastively  stressed  syllable  in  a 
sentence  will  have  an  unusually  high  peak  Fo  value  that 
can  equal  or  exceed  that  of  the  initial  stress  in  the 
sentence. 

Much  more  testing  would  be  needed  to  verify  this  effect  of  emphasis  and 
contrastive  stress. 

Another  intonational  regularity  that  was  verified  firmly  with  the 
255  sentences  was  the  following: 

• Fo  falls  after  its  highest  Fo  value  in  the  last  stress,  to  a 
low  value  at  the  end  of  each  declarative,  command,  and  WH 
question.  Fo  dips, then  rises, within  the  last  stress  of 
yes/no  questions,  and  rises  throughout  subsequent  unstresses. 

This  was  found  to  be  true  for  every  sentence  except  two:  one  declarative 
which  the  talker  spoke  with  a sense  of  being  incomplete  ("Men  will  know...." 
(KNOW  WHAT?J  ) and  a yes/nc  question  with  emphasis  on  a quantifier,  so  that 
it  seems  more  like  a WH  question  of  "How  many"  than  a yes/no  question  ("Will 
all  your  men  know?")  With  over  99-5%  of  the  declaratives,  commands,  and  WH 
questions,  and  about  95$  of  the  yes/no  questions,  satisfying  this  terminal 
contour  regularity,  we  can  consider  it  well  verified. 

So,  we  now  have  Fo  rising  to  the  first  stress,  and  (for  all  but  yes/no 
questions)  falling  from  the  last.  What  happens  between  the  first  and  last 
stress?  A preliminary  study  of  the  first  58  of  the  sentences  showed  that,  ic 
91$  of  the  cases: 

• Fo  falls  from  one  stress  to  the  next  in  a sentence  (or  clause). 

The  exceptions  were  spanning  major  syntactic  boundaries  that  were  followed 
by  highly  stressed  syllables.  Another  regularity  was  that: 

• Fo  on  unstressed  syllables  is  lower  than  on  all  preceding 
stresses,  and  is  usually  at  or  below  a value  along  the  line 
between  the  values  of  the  immediately  preceding  and  following 
stresses. 


35 


Report  No.  PI  11 963 


UNIVAC 


Exceptions  were  when  an  unstressed  syllable  had  higher  Fo  value  than  the 
preceding  (phrase- initial)  stress  because  the  stressed  syllable  was  short 
and  Fo  continued  to  rise,  plus  the  unstress  was  a high  vowel  while  the 
preceding  stressed  vowel  was  low  and  hence  had  an  intrinsically  lower  Fo 
(examples:  "any,"  "many"). 

I expected  from  published  claims  that  Fo  might  mark  subordination  of 
one  phrase  under  another,  but  found  no  clear  regularity.  I also  wa6  unable 
to  simply  characterize  any  unique  Fo  contours  in  coordinate  NP's  or  other 
coordinate  constructions  (cf.  Lea,  1972). 

Table  VTb  summarizes  some  prosodic  cues  that  were  found  for  one  clear 
structural  contrast;  namely  parenthetical  or  appositive  (non-restrictive) 
relative  clauses.  The  parenthetical  (whose  description  and  prosodic  values 
are  to  the  left  of  the  slashes  in  Table  VTb)  is  preceded  and  followed  by 
longer  time  intervals  between  it  and  the  surrounding  syllables;  that  is,  by 
a form  cf  brief  pauses.  Fo  falls  dramatically  before  the  parenthetical,  and, 
after  the  parenthetical,  rises  substantially.  A Tune  II  rise  in  Fo  marks 
the  incompletion  and  interruption  the  parenthetical  produces.  There  do  seem 
to  be  two  distinctive  types  of  parantheticals , though;  one  for  which  the 
Tune  II  occurs  before  the  parenthetical  (LI 1 1 and  L112)  and  one  for  which  it 
occurs  at  the  end  of  the  parenthetical  (L117>  L120,  L121).  Thus  it  appears 
that: 

• Parantheticals  are  demarcated  by  large  Fo  variations  and  long 
intersyllabic  time  intervals  at  their  boundaries,  and  by 
Tune  II  Fo  contours . 

Finally,  a very  clear  example  of  the  potential  of  using  Fo  contours  to 
detect  syntactic  structures  is  shown  for  the  sentences  of  the  "Flying  Planes 
Paradigm"  (named  after  the  two  alternative  structures  of  the  ambiguous  sentence 
"They  are  flying  planes.").  Sentences  like  "Lawmen  are  lying  men",  with 
the  structure  HP-Copulative-ADJ+N,  have  Fo  valleys  (hence,  phrase  boundaries) 
only  before  the  ADJ.  In  contrast,  sentences  like  "Lawmen  are  ruling  Maine.", 
of  the  structure  NP-AUI-V-N  have  valleys  (boundaries)  before  the  V and  the  N. 
This  was  always  true  for  all  the  sentences  in  subset  IE.  Thus,  Fo  contours, 
and  Fo-detected  boundaries,  can  clearly  distinguish  between  alternative  phrase 
structures.  It  is  interesting  to  note  that  no  such  contrast  was  evident  in 
perceived  stress  patterns  (Lea,  1976c  p.  42). 


36 


Table  VIb.  Prosodic  Cues  to  the  Presence  of  Parenthetical 


i 

I 

1 


Report  No.  PX  11965 


ONIVAC 


57 


Report  No.  PI  11 963 


UNIVAC 


We  may  summarize  by  saying  that  many  intonational  regulations  could  be 
noted  with  the  controlled  contrasts  in  the  255  sentences,  and  they  provide 
information  that  might  be  useful  in  speech  understanding  systems. 

3*6  Acoustic  Correlates  of  Stress 

The  stressed  syllable  location  program  does  a fairly  good  job  of  utilizing 
some  major  acoustic  correlates  of  stress.  However,  our  acoustic  prosodic 
analysis  of  the  255  sentences  provides  extensive  data  for  further  studies  of 
the  acoustic  correlates  of  stress.  The  Fo  contour  can  provide  peak  and  average 
Fo  values  in  each  syllable,  Fo  contour  slopes  and  shapes  within  each  syllable 
nucleus,  and  more  global  Fo  contour  features.  The  sonorant  energy  function 
and  syllabification  procedure  provide  the  duration  of  the  syllabic  nucleus, 
the  peak  energy  value  in  the  nucleus,  a measure  of  the  energy  integral  for 
the  nucleus,  and  other  energy  and  duration  information.  Further  study  of 
such  features  can  and  should  be  done.  In  particular,  such  studies  may  help 
devise  better  algorithms  for  stressed  syllable  location.  Unfortunately,  time 
did  not  permit  our  studying  such  data  in  any  detail. 

Further  studies  can  also  include  investigations  of  how  stress  decisions 
can  be  adjusted  to  take  account  of  the  intrinsic  prosodic  features  (phonetically- 
dictated  energies,  durations,  and  Fo  values)  of  various  vowels  and  consonants. 

We  know  that  an  unstressed  high  vowel  may  have  higher  Fo  than  a stressed  low 
vowel,  while  an  unstressed  low  vowel  may  have  higher  energy  and  longer  duration 
than  a stressed  high  vowel.  Voiced  consonantal  contexts  also  cause  a vowel 
to  be  longer,  its  Fo  to  be  somewhat  lower,  and  its  energy  to  be  somewhat 
higher.  Further  studies  could  perhaps  make  more  or  better  use  of  relative 
values  of  prosodic  features  in  comparing  one  syllable  with  its  neighbors. 

The  controlled  contrasts  in  the  3300- sentence  speech  data  base  should 
be  very  useful  for  undertaking  such  future  studies  of  acoustic  correlates  of 
stress. 

3-7  Specific  Implications  for  Speech  Understanding  Systems 

In  summary,  the  studies  of  acoustic  prosodic  patterns  in  the  255  sentences 
have  provided  important  confirmation  of  our  procedures  for  phrase  boundary 
detection,  syllabification,  and  stressed  syllable  location.  They  also  have 


38 


Report  No,  PX  11963 


UNIVAC 


suggested  specific  ways  in  which  such  algorithms  might  be  improved.  We  thus 
have  the  promise  of  improved  prosodic  analysis  tools  that  may  be  useful  in 
speech  understanding  systems. 

More  importantly,  these  studies  have  firmly  established  various  prosodic 
regularities  that  may  be  used  to  predict  prosodic  patterns  accompanying 
hypothesized  sentence  structures  within  speech  understanding  systems.  Regularities 
of  syllabification,  automatic  stress  assignment,  boundary  placement,  intersyllabic 
time  intervals,  and  Fo  contour  shapes  may  be  used  to  adjust  the  scores  on 
hypothesized  words  or  word  sequences,  in  a manner  similar  to  that  we  developed 
for  the  BBN  HWIM  system.  For  example,  we  may  predict  that  words  like  "worry" 
or  "moral"  may  be  detected  as  monosyllabic,  and  not  penalize  a word  match  that 
involved  only  one  acoustically-detected  syllable  for  such  theoretically- 
multisyllabic  words.  We  may  allow  "will  en-"  in  a sentence  like  "Ron  will 
enroll  airmen."  to  be  found  as  one  syllable,  or  we  may  first  try  to  improve 
syllabification  to  find  the  missing  syllable  boundary  (by  using  a new  spectrally- 
weighted  energy  function).  From  our  studies  we  may  predict  more  precisely 
just  which  syntactic  structures  will  exhibit  Fo  boundaries,  and  where  they 
will  be  positioned.  Then,  if  we  find  agreement  with  such  predictions,  we 
can  reward  that  structure  with  a higher  priority  in  the  hypothesizing  and 
testing.  Similarly,  if  stresses  occur  on  the  wrong  syllables  for  a certain 
structure  (that  is,  they  are  not  those  predicted  from  previously  observed 
regularities),  we  could  decrease  the  priority  of  hypothesizing  that  structure. 

Specific  structural  features,  such  as  the  sentence  being  a yes/no  question, 
a parenthetical  being  present,  or  one  of  alternative  syntactic  bracketings 
being  possible,  can  also  be  gleened  from  the  prosodic  data,  to  aid  parsing 
and  the  overall  control  strategy  of  a speech  understanding  system.  Much  has 
been  learned  from  the  study  of  the  controlled  speech  texts,  and  much  more  could 
be  learned  from  extensive  further  studies. 


39 


Re.w.  No.  PX  II963 


DKIVAC 


4.  REVIEW  OF  PROSODICS  RESEARCH  PROGRAM 

Sperry  Univac's  research  on  prosodic  guidelines  to  speech  understanding 
produced  many  experimental  results  that  help  us  better  understand  how  prosodic 
structures  relate  to  other  aspects  of  linguistic  structures,  such  as  phonemic 
sequences  and  phrase  structures.  We  have  presented  theoretical  arguments 
about  the  need  for  extracting  from  the  acoustic  speech  signal  some  prosodic 
cues  to  the  large-unit  linguistic  structure,  without  dependence  upon  the  prior 
determination  of  phonemic  structure  and  recognition  of  the  words  in  the  sentence 
(Lea,  Medretu,  and  Skinner,  1972a).  Vital  assumptions  of  a prosodically-guided 
approach  to  speech  understanding  have  been  verified  from  a variety  of  experiments. 
We  thus  have  both  theoretical  and  experimental  reasons  for  promoting  the  use  of 
prosodic  information  in  speech  understanding  Bystems.  These  will  be  outlined 
in  section  4.2,  following  a tabulation  in  section  4.1  of  all  major  results  from 
our  research  for  ARPA.  Section  4.3  provides  a review  of  our  various  cooper- 
ative efforts  to  make  prosodies  an  important  aspect  of  speech  understanding 
systems. 

In  sections  4.4  to  4.7,  we  briefly  review  four  major  areas  of  prosodic 
studies:  intonation  and  phrase  boundary  detection  (4.4),  perceived  stress  patterns 
( 4.5),  automatic  location  of  stressed  syllables  ( 4.6),  and  timing  cues  to 
linguistic  structure  ( 4.7).  We  also  have  made  some  initial  attempts  to  define 
and  test  some  specific  procedures  for  using  prosodic  information  in  speech 
understanding  systems  ( 4.8).  A major  contribution  suitable  for  aiding  future 
studies  is  our  development  of  a large  speech  database  with  controlled  linguistic 
contrasts  from  sentence  to  sentence  ( 4.9). 

4.1  Overview 

Table  VII  on  pages  4l  to  43  summarizes  the  major  contributions  that  have  come 
from  Sperry  bnivac's  ARPA-sponsored  research.  We  have  been  leading  advocates 
of  the  use  of  prosodies  in  speech  understanding,  doing  what  we  can  to  precisely 
define  the  role  of  prosodies  (as  listed  in  section  A of  Table  VII).  We  have  also 
cooperated  with  other  ARPA/SUR  contractors  in  many  general  aspects  of  the  overall 
large  speech  understanding  programs  (section  B of  Table  VII),  and  have  developed 
and  provided  computer  programs  and  other  services  to  system  builders  (section  C 
of  Table  VII).  Section  D of  Table  VII  (pages  42  and  43)  shows  that  we  contributed 
in  essentially  all  aspects  of  prosodies:  intonation,  stress  patterns,  phonetic 
durations,  rhythm,  pauses,  rate  of  speech,  acoustic  correlates  of  stress  and 


40 


I 


Report  No 


px  11963 


UNIVAC 


TABLE  VII.  CONTRIBUTIONS  OF  SPERRY  UNIVAC  TO  THE  ARPA 
SPEECH  UNDERSTANDING  RESEARCH  PROGRAM  (1972-1976) 


A DEFtNING  THE  ROLE  OF  PROSODICS  IN  SPEECH 

UNDERSTANDING 

• Stressed  syllable  are  important  because  they  occur 
m important  words,  exhibit  cloaa  phonemic-phonetic 
corraipondanca,  are  more  carefully  articulated  and 
more  reliably  analyzed  phonetically,  are  good  indicatori 
of  syntactic  itructuret,  and  are  closely  auociated  with 
predictable  phonological  distortions  at  various  rates 
of  speech. 

e Various  machine  transcriptions  of  speech  (i.a.,  results 
of  automatic  segmentation  and  labelling  of  speech) 
ware  analyzed,  and  it  was  shown  that  far  fewer  errors 
in  vowel  and  obstruent  classification  occurred  in 
stressed  syllables  than  in  unstressed  or  reduced 
syllables. 

e Linguistic  and  perceptual  arguments  suggest  that 

syntactic  structures,  detectable  from  prosodic  patterns, 
should  be  used  at  early  stages  of  speech  understanding. 

e Sentences  which  had  been  troublesome  to  the  88N 
speech  understanding  system  wars  processed  throutfl 
the  Sperry  Univac  prosodic  analysis  programs,  and 
specific  prosodic  cues  were  found  that  could  be  used 
to  determine  the  type  of  sentence  end  the  specific 
syntactic  bracketing  intended  by  the  talker. 

a An  overall  strategy  for  prosodically-guided  speech 
understanding  has  been  specified,  it  involves  use  of 
stressed  syllables  as  anchor  points  for  reliable 
phonetic  and  phonemic  analysis,  restricting  expensive 
.acoustic  analyses  to  those  areas  where  prosodies  say 
such  analysis  is  needed,  guiding  the  selection  of 
applicable  phonological  rules,  and  detection  of 
aspects  of  syntactic  structure  directly  from  prosodic 
pet  ns. 


B.  COOPERATIVE  EFFORTS  WITH  OTHER  ARPA/SUR 

CONTRACTORS 

e Our  synractic  and  prosodic  analysis  of  250  sentences 
produced  by  SUS  contrectors  resulted  in  the  selection 
of  the  "31  ARPA  Sentences",  used  in  various  common 
studias  such  as  workshops  on  parameterization,  speech 
segmentation,  and  phonological  rules. 

e Sperry  Univac  end  other  ARPA  contractors  have  co- 
operated on  major  common  tasks  of  selecting  speech 
data  bases,  standardizing  recording  procedures  end 
phonemic  notations,  compiling  end  applying  sound 
structure  rules,  compering  speech  parameterization 
techniques  and  speech  ngmentation  results,  and  other 
comparative  activities.  In  particular,  a tutorial  was 
presented  on  prosodic  structures,  prosodic  information 
on  selected  data  bases  was  supplied  to  several  work- 
shops, sample  parameters  and  segmentation  resul's  were 
presented  at  the  respective  workshops,  and  sessions  on 
prosodic  structures  were  chaired  et  workshops  on 
phonological  rules. 


e Sperry  Univac  actively  participated  in  steering  committee 
meetings  and  other  activities  guiding  the  overall  SUR 
program.  Dr.  Mark  Medress  of  Sperry  Univac  served  as 
Assistant  to  the  Chairmen,  and  later  as  Acting  Chairman 
of  the  Steering  Committee. 

a Sperry  Univac  was  actively  involved  in  the  development 
of  ideas  for  a five-year  follow-on  program  to  extend  the 
current  five-year  SUR  program. 

a Sperry  Univac  produced  nine  semi-annual  reports,  five 
other  ARPA-sponsored  reports,  5 journal  papers.  14 
oral  presentations  and  25  SUR  NOTES  describing  our 
research  (see  Appendix  B),  and  extensive  communica- 
tions over  the  ARPANET 


C.  COMPUTER  PROGRAMS.  AND  APPLICATIONS  TO 

ARPA/SUR  SYSTEMS 

a Computer  programs  were  developed  and  supplied  to 
ARPA/SUR  contractors,  providing  tha  following  pro- 
sodic information: 

- Fo  Contours 

- intonetional  Phrase  Boundaries 

- Syllabification 

- Stressed  Syllable  Locations 

These  Drograms  were  implemented  in  the  BBN  system, 
and  used  for  devising  similar  programs  at  SDC. 

e A procedure  has  been  developed  for  using  prosodicelly- 
detectad  phrase  boundaries  to  weigh  word  end  phrase 
hypotheses  in  tha  Bolt  Beranek  and  Nswmsn  (BBN) 
HWIM  speech  understanding  system.  The  state- 
transition  arcs  of  tha  augmented  transition  network 
grammar  were  specially  marked  if  they  ware  expected 
to  be  immediately  preceded  by  intonationally-detected 
phrase  boundaries.  The  scores  on  words  associated  with 
the  arcs  were  increased  if  expected  boundaries  were 
detected,  or  decreased  if  expected  boundaries  were 
missing  in  the  acoustic-prosodic  data.  Sixteen  BBN 
sentences  were  processed  through  a computer  program 
that  detected  phrase  boundaries  at  fall-rise  valleys  in 
fundamental  frequancy  contours.  Analysis  of  simple 
traces  of  the  hyoothesizing,  tasting,  and  constructing 
of  syntactic  stru  urea  by  the  HWiM  system  showed 
that  prosodic  adjustment  of  scores  would  increase  tha 
likelihood  of  correct  words  end  phrases  being  selected 
before  incorrect  ones.  Theta  ideas  were  later  refined, 
tested  further,  and  implemented  in  the  HWIM  system, 
but  tima  didn't  permit  their  fuil  tasting. 


41 


Report  No.  PX  11 9^3 


UNIVAC 


TABLE  VII.  CONTRIBUTIONS  OF  SPERRY  UNIVAC  TO  THE  ARPA 
SPEECH  UNDERSTANDING  RESEARCH  PROGRAM  (1072-1976) 

D EXPERIMENTAL  RESULTS 


Paroepbon  of  Proeodiei 


• Litaenere  cat  leliabty  perceive  which  cyllabiet  an 
iwd  (with  6%  oonfuaion) 

a Perceived  atraea  panarm  agree  with  ihoaa  aaeigned  whan 
eubjects  ara  (Nan  only  tha  wrtnan  text. 

a Carta  In  worth  ("coniant  worth'')  ara  coniiuantfy 

paroanrad  at  ttieaeaJ,  whlla  othera  (“tunctlon  worth”) 
ara  paroaivad  aa  unitraaaad  or  raduead. 

a Rapaatad  vartii  or  nouni  in  coordinala  conatructioni 
hava  lowar  perceived  rtraai  lavali  than  in  eimpte 
conatructioni. 

a Verb!  auxiliary  verba.  and  conjunction!  have  lowar 
paroaivad  itraaa  lavali  whan  In  aubordincta  phraaa 
etructu-ee. 

a Liitanan  can  raliably  paroahra  phraaa  boundariai  in 
ipactrally^nvaraad  apaach. 

Intonational  Phraaa  Boundariai 

a Subatanbal  Fo  vallayi  occur  at  maior  phraaa 

boundariai  (bafora  NP'a.  Vi,  ADV'i,  PP'i.  Clauaaa). 

a About  80-90%  of  tha  major  phraaa  boundariaa  In 
apaacn  can  ba  datactad  fmm  Fo  vellayi. 

a Falaa  boundary  dawctioni  raault  from  Fo  variation! 
naar  obctruenta. 

a Tha  Fo-datactad  phraaa  boundary  occun  juit  bafora 
tha  firat  atraai  in  tha  following  phraaa. 

Fo  Contour! 


a Fo  contour!  art  a auparpoiition  of  fali-riaa  dauaa 
contour!,  arehatypa  phraaa  contour! , Fo  riaai  at 
atraai  pootiom,  and  Fo  variation!  at  obatruenta 
(la.,  dipi  during  voiaad  obatruenta.  and  auddan 
jumpa  at  unvoicing  with  aubaaquant  rapid  fall  from 
high  valuat). 

a Fo  contouri  within  phraaaa,  and  phraaa  boundary  break! 
In  tha  Fo  contour,  may  ba  modal  lad  by  a atathtical 
curva  fitting  prooadura  uaing  a modified  form  of  tha 
gamma  dwtri  button. 

a Tha  Fo  Peak  in  a aantanca  h at  tha  firat  m— ri  ayl- 
labia  of  tha  aantanca 

a Fo  fall!  attar  tha  tart  atraai  of  all  dadarativat,  com- 
mand!. and  WH  quartiona,  and  nam  within  and  attar 
tha  laat  atraai  of  yaa/no  quart  ion; 

a Succeeding  itraaaaa  hava  program vaty  lowar  Fo 

vaiuai,  axcapt  at  major  ayntactic  boundariai,  whara 
Fo  may  rue  for  highly  atia— d or  amphauiad  ayllabtaa 


a An  unitraaaad  ayllabla  haa  lowar  Fo  than  all  ttrataat 
that  praoada  It  arlthln  a dauaa. 

a Paranthatleal  phraaaa  an  ptaoadad  and  followad  by  long 
imartyliabic  Intarvaii,  and  marfcad  by  a Tuna  II  rial,  and 
large  Fo  variation!  at  both  tndi  of  tha  paranthctical 

a Subordination  of  phraaaa  doaa  not  appaar  to  ba  raadily 
datactad  from  Fo  oontoura. 

a Glottal  atopt,  drtactabta  by  larga  local  variationa  in  Fo 
contouri.  ara  twalva  timaa  mora  llkaly  to  occur  bafora  a 
rtraaaad  than  an  unitraaaad  vovral  in  ali-eonorant  phonamlc 
nquancaa  Thay  alio  fracuantiy  mark  phraaaa  boundariaa. 

Syllabification 

a Ovar  90%  of  tha  ayl'.eblti  in  connactad  apaach  may  ba 
found  from  high  cnargy  nuclei  aurroundad  by  dipa  of 
4db  or  mora  Ip  anargy. 

a Tha  baginning  and  ending  of  a ayllabic  nudaua  may  ba 
quite  accurately  located  at  tha  outanmoat  point!  whara 
anargy  ia  It  Wait  half  way  abova  tha  dip  in  anargy.  toward! 
tha  peak  energy  level  In  the  nucle-ji. 

a Syllable!  ara  not  datactad  whan,  during  all-eon  or  ant 
eequencee,  the  anargy  level  doet  not  dip  adequately  for 
ayllabla  boundary  detaction.  Another  apectrally-weighwd 
energy  function  or  ee  ^mental  information  (auch  aa  formant 
traraloona  and  automatic  datactioni  of  non -vowel 
eonoranta)  might  help  locate  auch  mating  ayllabla 
boundariaa. 

S treated  Syllable  Location 

a Ovar  90%  of  lha  ft  railed  lyliabiai  in  connactad  ipeecb 
may  be  located  at  thou  lyliabic  nuclei  that  have  non- 
falling Fo  and  ara  highlit  in  anargy  in  tha  vicinity  of  aithar 
(a)  tha  Fo  rie  to  the  peak  value  at  tha  baginning  of  a 
phraaa,  or  (b)  iccal  Fo  riaea  about  a gradually  failing 
arehatypa  lint  in  tha  later  pert  of  a phraaa 

e About  20%  of  ail  atraai  location!  ara  falaa  (not  pointing 
to  ptreenred  atrataea),  due  to  ealection  of  tha  wrong 
nudaua  in  a nettpiborhood  of  an  Fo  rite.  Adjuvtmenta 
of  duraliom.  inaenaltiea,  and  Fo  contouri  for  vowel  height 
and  coruonantal  context  may  reduce  auch  erron. 

a Straaaad  ayllabla  location  from  uaing  Fo  riaea  alone  or 
longafuralion  nudei  alone  wera  found  to  ba  conaiderabiy 
leu  accurate  and  produce  mora  falaa  ttraaaaa  than  loca- 
tiona  uaing  tha  above  "arehatypa  contour”  algorithm. 

a Tha  firat  atraai  In  a aantanca  waa  found  99%  of  the  time 
by  locating  the  nucleua  immediately  preceding  the  peak 
Fo  In  tha  aantanca. 


k2 


Report  No.  ?X  11 963 


UNIVAC 


TABLE  VII.  CONTRIBUTIONS  OF  SPERRY  UNIVAC  TO  THE  ARPA 
SPEECH  UNDERSTANDING  RESEARCH  PROGRAM  (1972-19761 

0.  EXPERIMENTAL  RESULTS  (continued) 


Itochrony  of  Stresses 

e Time  Interveti  between  stresses  ire  * lineer  function 
of  the  number  of  intervening  ttmeee 

e Interstress  intervali  tend  to  duster  near  about  0.4 
•econdi  (i.a.,  tend  toward  iiocftrony  of  1 1 raises ) 
primarily  becauaa  of  the  alternating  itraaa/unttreaa 
pattern  of  Engfldt. 

Structural  Pauses 

e Long  penods  of  unvoicing  (including  periods  of 
silence)  occur  between  clauses. 

e The  duration  of  e pause  is  usually  one  rhythm  unit 
(one  average  in  tent  rats  interval)  between  clauses 
and  two  units  between  sentences. 

e Hesitation  pauses  era  usually  substantially  longer 
than  structural  pauses. 

Phrase-Find  Lengthening  of  Vowels  and  Sonorants 

e Vowels  and  sonorant  consonants  are  substantially 
lengthened  in  phrase-final  positions. 

e The  phrase-final  lengthening  axtends  !o  groups  of 
neighboring  syllables  including  the  last  stress  in  a 
phrase  and  any  subsequent  unstresses  up  to  the 
first  stress  in  the  next  phrase  plus,  in  some  cates, 
the  next  aarliar  stress  and  neighboring  unstresses. 
Hanca  detected  phrasa  boundaries  (at  the  end  of 
the  lengthened  group)  did  not  always  occur  at  the 
time  of  the  syntactic  boundary. 

e Over  90%  of  tha  phrase  boundaries  perceived  by 
listeners  who  heard  spectrally  inverted  speech  ware 
detectable  from  groups  of  lengthened  syllables 

Interstress  Intervals  as  Phrase  Boundary  Cues 

e Over  96%  of  the  major  phrase  boundaries  perceived 
in  spectrally  inverted  speech  may  be  detected  from 
long  interstress  Intervals  (*  0.6  seconds)  spenning 
the  boundaries. 

Rata  of  Speech  and  Phonetic  Distortions 

e Tha  time  interval  between  two  stresses  was  diown 
to  be  inversely  correlated  with  tha  percentage  of 
phones  (between  those  stresses)  that  had  been 
erroneously  categorized  by  automatic  labelling 
schemes.  Tha  interstress  Interval  wet  demonstrated 
to  be  a better  predictor  of  error  rate  (and,  thus,  a 
better  indicator  of  applicable  phonological  rules) 
than  other  measures  of  speech  rate,  such  at  the 
number  of  syllables  per  second 


Prosodic  Hypotheses 

e A careful  study  of  tha  literature  and  previous  analytes 
of  prosodic  data  resulted  in  a compilation  of  an  ex- 
tensive sat  of  hypotheses  and  rules  relating  prosodic 
patterns  (Intonation,  strati,  rhythm,  etc.)  to  linguistic 
structures  (sentence  types,  syntactic  bracketing  and 
syntactic  categories,  phonetic  sequences,  semantic 
structures,  ate.). 


Database  of  Controlled  Linguistic  Contrasts 

e A data  base  of  1 100  sentences  was  desipied  to  care- 
fully isolate  factors  influencing  prosodic  and  phonetic 
structures.  A set  of  178  "Phonetic  Sentences"  is 
especially  suitable  for  tatting  automatic  schemas 
for  formant  tracking,  phonetic  segment  classification, 
and  phonological  rules  application.  A set  of  922 
"Protosyntactic  Sentences”  was  designed  such  that 
various  minimal  pairs  of  sentences  could  isolate  pro- 
sodic effects  due  to  sentence  rye.  syntactic  bracketing, 
subordination,  coordination,  lexical  stress  patterns, 
semantic  contrasts,  and  phonetic  sequences.  The  data 
bate  includes  sentences  typical  of  those  handled  by  tha 
ARPA  speech  understanding  tystamt. 

e Tha  database  hat  been  divided  into  small  subsets 
that  test  specific  prosodic  hypotheses  and  linguistic 
contrasts.  Initial  subsets  totalling  255  sentences 
have  bean  digitized  end  prccassed  through  prosodic 
programs  to  study  such  regularities  at  the  placement 
of  protodically  detected  phrase  boundaries  as 
stresses  move,  which  phrase  boundaries  are  marked 
In  Fo  contours,  perceived  and  automatically 
detected  strata  patterns,  and  overall  Fo  contours. 

Many  further  tests  could  be  undertaken  with  this 
dstabaaa. 


43 


Report  No.  PI  11963 


UNIVAC 


boundaries,  listeners'  perceptions  of  prosodies,  syllabification,  and  the 
compilation  of  data  baser  and  hypotheses  for  extensive  controlled  investigations 
of  prosodic  structures.  Contrary  to  much  of  past  work  with  prosodies  and  some 
current  work  in  speech  synthesis,  which  needs  to  focus  only  on  the  salient 
features  of  acoustic  prosodic  data  (cf.  e.g.,  Allen  and  O'Shaughnessy,  1975; 
O'Shaunessy,  1976),  these  studies  have  been  very  exacting,  with  use  of  actual 
acoustically-derived  information  such  as  computed  fundamental  frequency  contours 
(with  all  their  inexactitudes,  occasional  octave  errors,  local  perturbations, 
etc.),  sonorant  energy  functions,  and  automatic  voicing  decisions.  Our  computer 
programs  for  prosodic  analysis  then  directly  use  that  imperfect  information, 
such  as  syllabification  being  based  on  sonorant  energy  contours,  and  stressed 
syllable  location  being  based  on  Fo  and  energy  contours  as  well  as  the 
syllabification  results. 

Basically,  our  effort  has  been  two-pronged:  (1)  experimental  research  about 
prosodic  structures,  and  (2)  cooperation  with  other  ARPA  contractors  for  general 
tasks  and  for  development  of  prosodic  aids  to  the  developing  speech  understanding 
systems.  In  sections  4.1*  to  4.9  we  will  review  the  experimental  research.  In 
sections  4.2,  and  4.3,  we  will  review  the  cooperative  efforts  in  speech  understanding 
system  development. 

4.2  Defining  the  Role  of  Prosodies  in  Speech  Understanding  Systems 

Prior  to  the  ARPA/SUR  program,  prosodic  cues  to  sentence  structure,  and 
prosodic  aids  to  the  location  of  reliable  acoustic  phonetic  inf ormation,  were 
given  little  or  no  attention  in  speech  recognition  efforts.  The  strong  moti- 
vations for  the  use  of  prosodic  patterns  in  speech  recognition  procedures  were 
thus  presented  in  some  detail  in  our  first  report  (Lea,  Medress,  and  Skinner, 

1972a,  section  2),  and  subsequent  reports  (notably,  Lea,  1976b).  In  particular, 
we  showed  that  stressed  syllables  are  of  prime  importance  in  speech  recognition, 
because  of:  (a)  the  occurrence  of  stressed  syllables  in  semantically  important 
words;  (b)  the  close  correspondence  between  detected  phonetic  structure  and  under- 
lying phonemic  structures  in  stressed  syllables;  (c)  the  much  higher  reliability 
of  phonetic  classification  possible  in  stressed  syllables  (as  evidenced  by  the 
analysis  of  results  from  the  CM3  Speech  Segmentation  Workshop);  (d)  the  vital  cues 
to  syntactic  structure  that  stressed  syllables  provide;  and  (e)  the  close 
association  between  time  intervals  between  stresses  (as  rate- of- speech  measures) 


Report  No.  PX  11963 


UNIVAC 


and  applicable  phonological  rules. 

In  our  first  progress  report  (Lea,  Medress,  and  Skinner,  1972a),  we 
reviewed  linguistic  and  perceptual  arguments  that  prosodic  structures  should 
be  used  to  detect  aspects  of  syntactic  structure  independently  of  any  phonemic 
analyses  and  word  matching  algorithms.  Linguistic  arguments  suggest  that 
phonetic  sequences  are  not  invariant  linear  strings  that  occur  each  time  a word 
is  spoken,  and,  taken  alone,  they  cannot  be  relied  upon  to  provide  all  the 
information  needed  for  determining  the  word  sequence  in  a sentence.  These 
arguments  ha-'e  been  vividly  verified  by  the  demonstrations  with  the  1976  ARPA 
speech  understanding  systems  (especially  at  CMU),  which  showed  that  syntactic 
constraints  were  very  important  in  providing  successful  speech  understanding. 
Perceptual  arguments  indicate  that  human  listeners,  as  successful  archetypes  of 
speech  understanding  mechanisms,  use  phrase  units  at  earliest  stages  of  speech 
perception,  and  that  those  phrases  are  detected  from  prosodic  information. 

Our  experiments  have  clearly  confirmed  the  marking  of  large-unit  linguistic 
structures  in  prosodic  patterns. 

To  further  confirm  the  value  of  prosodies  in  speech  understanding  systems, 
we  conducted  two  studies,  which,  though  they  are  experiments  and  thus  might  be 
listed  under  section  D of  Table  VII,  have  as  their  primary  consequence  the 
demonstration  of  the  valuable  role  of  prosodies  in  speech  understanding,  and 
thus  are  listed  under  section  A of  Table  VII.  Ve  investigated  various  machine 
transcriptions  of  speech  resulting  from  the  automatic  segmentation  and  labelling 
of  speech  provided  by  several  research  groups  reporting  at  the  1973  CMU 
Symposium  of  Speech  Segmentation.  We  found  that  far  fewer  errors  in  vowel  and 
obstruent  classification  occurred  in  stressed  syllables  than  in  unstressed  or 
reduced  syllables.  In  another  study,  sentences  which  had  been  troublesome  to 
the  BBN  speech  understanding  system  were  processed  through  our  prosodic  ana  ../sis 
programs,  and  specific  prosodic  cues  (Fo  contours,  detected  phrase  boundaries, 
stressed  syllable  locations,  pauses,  and  timing  cues)  were  found  that  could  be 
used  to  determine  the  type  of  sentence  and  the  specific  syntactic  bracketing 
intended  by  the  talker. 

We  also  defined  an  overall  strategy  for  soeech  understanding  which  uses  stressed 
nuclei  as  islands  of  phonetic  reliability  to  be  detected  in  early  stages  of 
phonetic  analysis,  and  uses  prosodically-derived  syntactic  hypotheses  to  guide 
syntactic  parsing  and  an  overall  analysio-by-synthesis  process. 


Report  No.  PI  119^3 


UNIVAC 


It  is  Interesting  that,  after  our  efforts  to  clearly  define  an  important 
role  for  prosodies  in  speech  understanding  systems,  the  Steering  Committee  of 
the  AEPA  Speech  Understanding  Research  program, in  its  mid-tern  review  of  the 
total  SUR  program,  suggested  that  a major  attack  should  he  mounted  on  the  area 
of  prosodies,  since  this  source  of  Imovledge  had  not  been  used  in  any  previous 
system,  though  it  offered  the  possibility  of  a unique  contribution  to  sentence 
disambiguation  and  overall  system  control  strategies. 

4 . 3 Cooperative  Efforts  to. Advance  the  Development  of  Speech  Understanding  SvBtems 


Besides  providing  solid  arguments  for  the  use  of  prosodies  in  speech 
understanding,  we  have  participated  in  various  other  aspects  of  system  development. 
Under  the  guidance  of  the  Steering  Committee,  Sperry  Univac  haB  been  engaged 
in  a variety  of  cooperative  efforts  to  aid  the  progress  of  the  overall  SUR  pro- 
gram. Our  syntactic  and  prosodic  analysis  of  250  sentences  produced  by  the 
system  building  contractors  resulted  in  the  selection  of  27  generally  intereSwj.ng 
sentences  representive  of  the  task  domains  being  used  in  the  various  systems. 

These  ultimately  formed  the  bulk  of  the  ”31  ARPA  sentences"  (Lea,  Medress  & 
Skinner,  1973b)  used  in  various  common  studies  such  as  workshops  on  parameteriza- 
tion, segmentation,  and  phonological  rules. 

Sperry  Univac  and  other  ARPA  contractors  have  cooperated  on  major  common 
tasks  that  were  helpful  to  the  various  systems  and  the  research  being  conducted 
within  the  program.  These  included:  selecting  speech  data  bases;  standardizing 
recording  procedures;  comparing  speech  parameterization  techniques;  developing 
the  uniform  ARPA3ET  phonemic  notation  (Medress,  1972,  SUR  Note 32);  comparing 
various  methods  of  speech  segmentation  and  labelling;  and  compiling,  comparatively 
evaluating,  and  applying  sound  structure  (phonological)  rules.  At  an  early 
stage  in  the  program,  Lea  presented  a tutorial  on  prosodic  features  and  linguistic 
structures,  at  the  ARPA  Seminar  on  Acoustic  Phonetic  Characteristics  of  English 
Sentences.  He  also  chaired  sessions  on  prosodic  phenomena  at  the  workshops  on 
phonological  rules. 

Mark  Medress  of  Sperry  Univac  served  during  part  of  the  SUR  program  as 
Assistant  to  the  Chairman,  and  later  as  Acting  Chairman,  of  the  ARPA/SUR  Steering 
Committee.  Sperry  Univac  bIbo  was  very  active  in  other  Steering  Co:aiittee 
activities,  including  proposing  specific  ideas  and  plans  for  a five  year  follow- 
on  program  to  extend  and  apply  the  results  of  the  five-year  SUR  program. 


46 


Report  No.  PX  11963 


UNIVAC 


A concrete  produci  of  Sperry  Univac's  work  was  the  circulation  of  nine 
regular  ("semiannual")  progress  reports,  five  other  ARP A- sponsored  reports, 
five  journal  papers,  14  oral  presentations,  and  26  SUR  Notes. 

While  these  various  cooperative  efr  -ts  within  the  SUR  program  required 
a significant  portion  of  our  effort  and  other  groups/  efforts,  they  represent 
one  of  the  greatest  benefits  of  the  ARPA/SUR  program.  Little  had  been  done  before 
this  program  to  compare  a.iemative  methods  in  speech  analysis  or  to  ern  ourage 
close  cooperation  and  even  direct  competition  among  speech  research  groups. 
Interchange  about  parameterization  techniques,  speech  segmentation  procedures, 
phonological  rules,  syntactic  models,  semantic  and  pragmat:  constraints,  and 

system  structures  has  significcntly  contributed  to  the  success  of  the  program  and 
the  sp  cific  systems.  The  compilation  and  application  of  phonological  rules 
was  a major  contribution  to  speech  sciences,  it  is  worth  noting,  by  the  way, 
that  many  of  the  selected  phonological  rules  depend  upon  prosodic  information 
such  as  stress  patterns. 

Perhaps  the  most  specific  contributions  of  Sperry  Univac's  work  to  the 
development  of  speech  understanding  were  in  providing  prosodic  analysis 
routines  and  specific  proposals  of  how  to  use  prosodies  in  the  ARPA/SUR  systems. 

We  developed  and  circulated  to  the  ARPA/SUR  community,  a FORTRAN  program  for 
obtaining  an  Fo  value  every  10  ms,  based  on  a center-clipped  autocorrelation 
analysis  (Skinner,  1973a, b).  This  program  was  subsequently  modified  and  used 
by  other  ARPA/SUR  contractos,  including  BBN  ud  SDC.  Another  program  which 
was  delivered  to  SUR  contractors  was  the  FORTRAN  program  "B0UiJD3'‘ , which  detects 
syntactic  boundaries  from  fall-rise  valleys  in  Fo  contours  and  long  periods  of 
unvoicing  ("pauses").  This  program  was  incorporated  into  the  BBN  HWIM  system, 
and  ideas  from  it  were  also  used  at  SDC. 

A third  FORTRAN  computer  program  (CHUNK")  used  sonorjn  energy  contours  to 
locate  the  peaks,  beginnings,  and  endings  of  syllabic  nuclei,  and  to  locate 
syllable  boundaries.  This  program  was  incorporated  into  the  BB’  HWIM  system, 
ana  a similar  pr^°Tam,  based  in  part  on  Paid  Mermelstein' s wonc  at  Haskins 
Laboratories,  was  incorporated  into  the  SDC  system.  Our  FORTRAN  program  for 
locating  stressed  syllables  ("STRESS")  was  based  on  rchetype  Fo  contours  in 
detected  phrases  and  high  values  of  energy  integral  in  syllabic  nuclei.  This 
program  was  incorporated  into  the  BBN  HWIM  System. 


hi 


Report  No.  PI  11 96  3 


IJNIVAC 


Last,  but  by  no  means  least,  of  our  efforts  to  apply  prosodies  to  SDH 
systems  was  our  development  of  a procedure  for  using  Fo-detected  phraBe 
boundaries  to  adjust  the  scores  on  word  and  phraBe  hypotheses  in  the  Bolt 
Beranek  and  Newman  (BBN)  HWIM  speech  understanding  system,  bo  that  correct  words 
and  structural  hypotheses  will  be  proposed  at  earlier  stages  in  parsing,  and 
erroneous  theories  ctn  be  avoided.  The  state-transition  arcs  of  the  augmented 
transition  network  gr.unmar  were  specially  marked  if  they  were  expected  to  be 
immediately  preceded  by  intonationally-detected  phrase  boundareis.  The  scores 
on  words  associated  with  the  arcs  were  increased  if  expected  boundaries  were 
detected,  or  decreased  if  expected  boundaries  were  missing  in  the  acoustic- 
proBodic  data.  Sixteen  BBN  sentences  were  processed  through  the  computer  pro- 
gram that  detected  phrase  boundaries  at  fall-rise  valleys  in  fundamental  fre- 
quency contours.  Analysis  of  sample  traces  of  the  hypothesizing,  testing,  and 
constructing  of  syntactic  structures  by  the  HWTM  system  showed  that  prosodic 
adjustment  of  scores  would  increase  the  likelihood  of  correct  words  and  phrases 
being  selected  before  incorrect  ones.  These  ideaB  were  later  refined  and  modified 
to  handle  BBN's  new  shortfall  density  scoring  procedure,  tested  further,  and 
implemented  in  the  HWIK  system.  BBN  researchers  planned  to  test  the  HWIM  system 
with  and  without  prosodic  information  but  were  unable  to  perform  such  testB 
before  their  contract  ended.  Still,  as  was  noted  in  section  2.2,  our  BtudieB 
showed  that  prosodies  caused  a rearranging  of  the  priorities  of  hypotheses  such 
that  correct  theories  would  have  been  tried  earlier  than  without  the  prosodic 
guidelines,  so  that  false  parsing  paths  could  be  avoided,  parsing  could  be 
more  efficient,  and  more  correct  parses  should  result. 

4.  Intonation  and  Phrase  Bomderlss 

An  algorithm  was  devised  for  segmenting  speech  into  grammatical  phrases, 

1 

by  marking  phrase  boundaries  at  the  bottoms  of  "substantial"  fall-rise  valleys 
in  fundamental  frequency  (Fq)  contours.  This  algorithm  was  implemented  as  a 
FORTRAN  program  on  the  Sperry  Univac  interactive  speech  research  facility,  then 
supplied  over  the  ARPANET  to  all  SDR  contractors.  It  uses  Fo  data  obtained  from 
the  Sperry  Dnivac  fundamental  frequency  tracking  program  (Lea,  Medress,  and 
Skinner,  1973a,  Appent.ix  A).  The  algorithm  also  successfully  detected  clause  and 
sentence  boundaries  wherever  long  (350  millisecond)  stretches  of  unvoicing 
.e.,  "pauses")  occurred. 


1.  In  earlier  studies,  a "substantial"  Fo  valley  was  considered  to  be  defined 
by  a minimum  of  7%  fall  and  756  rise  in  Fo.  In  the  most  recent  studies,  we  UBed 
4 eighth  tones  as  the  threshold  value  for  an  Fo  valley. 

48 


Report  No.  PX  11963 


UNIVAC 


A series  of  "natural  experiments"  (cf.  Anderson,  1966)  were  conducted  to 
test  the  algorithm.  In  such  "natural  experiments",  one  does  not  directly  con- 
trol an  independent  variable  (such  as  syntactic  bracketing)  and  study  resultant 
changes  in  a dependent  variable  (such  as  valleys  in  Fc  contours);  rather,  he 
simply  looks  at  the  data  obtained  from  naturally-occurring  phenomena  (such 
as  the  speech  which  had  previously  been  recorded  at  Purdue  University  and 
identified  as  the  Rainbow  Script,  spoken  by  six  talkers,  and  the  Monosyllabic 
Script,  spoken  by  two  talkers).  Our  first  experiment  included  those  texts, 
plus  13  cf  the  31  ARPA  sentences.  For  such  speech  texts,  we  demonstrated 
that  over  80$  of  all  intuitively  predicted  syntactic  boundaries  were  detected 
from  substantial  fall-rise  valleys  in  Fo  contours.  Over  half  of  the  "missing" 
boundaries  were  between  noun  phrases  and  auxiliary  or  main  verbs.  In  some 
later  reports  we  excluded  such  NP-AUX  and  NP-V  boundaries,  getting  scores 
nearer  90$  for  all  other  boundaries. 

Some  "extra"  boundaries  were  detected  at  places  in  the  syntax  where  they 
had  not  been  expected,  and  some  "false"  boundaries  also  were  detected  where 
they  obviously  had  no  relation  to  syntactic  structures.  The  false  boundaries 
were  almost  all  due  to  local  Fo  variations  introduced  by  obstruents.  The 
"extra"  boundaries  and  "missing"  boundaries  (expected  but  not  detected)  needed 
to  be  better  understood,  yet  the  uncontrolled  nature  of  the  speech  texts  made 
it  difficult  to  find  simple  explanations.  More  controlled  studies  with 
sentence  pairs  with  minimal  differences  in  structure  needed  to  be  conducted. 

We  extended  those  tests  in  a later  study,  to  include  the  full  set  of  31 
ARPA  man-computer  interaction  sentences.  Boundary  detections  were  somewhat 
more  reliably  found  with  speech  read  from  a written  text  than  in  some  of  the 
simulated  man-computer  interactions.  This  test  still  involved  uncontrolled 
speech  texts. 

In  1975,  after  the  large  Sperry  Uni vac  speech  database  had  been  designed 
and  recorded,  tests  wore  conducted  on  the  ability  to  detect  phrase  boundaries 
in  a subset  of  159  designed  sentences,  involving  three  talkers.  All  the 
sentences  in  thi  subset  were  simple  (unembsdded)  declarative  sentences  with 
one  of  six  phrase  structures.  The  majority  were  of  the  form  "Ron  will  enroll 
NP",  to  test  how  the  Fo-detected  boundary  before  the  NF  moves  as  the  first  stress 
in  the  NP  ioves.  Unfortunately,  two  of  the  talkers  shoved  very  little  Fo  variation 
throughout  each  utterance,  so  that  wo  could  not  conclusively  determine  which 

49 


Report  No.  PZ  11 963 


UNIVAC 


syntactic  constituents  are  separated  ty  fall-rise  patterns  of  fundamental 
frequency.  The  main  conclusion  that  resulted  from  this  study  was  that  the 
rise  in  Fo  after  any  detected  boundary  will  begin  at  the  first  stress  in  the 
following  constituent  (for  all  the  talkers) . The  Fo-detected  boundary 
invariably  occurs  just  before  the  first  stress  in  the  following  phrase.  Of 
course,  other  factors  like  dips  of  Fo  during  nearby  voiced  obstruents  could 
cause  local  movement  of  the  boundary,  but  the  syntactic  structure  and  stress 
patterns  . dictate  that  the  boundary  be  just  before  the  first  stress.  In  all- 
sonorant  sentences  such  as  the  159  studied  then,  the  placement  of  the  boundary 
immediately  before  the  first  stress  of  the  next  phrase  becomes  very  apparent. 


The  most  recent  study  of  intonational  phrase  boundaries,  described  in 
section  3-3  of  this  report,  further  verified  this  placement  of  the  Fo  boundary. 
However,  with  the  one  talker  used  in  this  latest  study,  whose  Fo  contours 
are  more  animated  and  show  clear  boundary  markings,  we  were  able  to  also 
determine  which  constituents  are  regularly  accompanied  by  Fo-datected  boundaries. 
After  a stressed  constituent,  we  find  that  noun  phrases,  sentence  adverbs, 
conjuncts,  relative  clauses,  and  parantheticals  are  preceded  by  Fo  boundaries, 
as  are  stressed  main  verbs  (and  auxiliary  verb  phraseB  if  and  only  if  they 
contain  a stress  such  as  a negative ) . The  two  parts  of  a compound  noun  are 
also  separated  by  an  Fo  boundary.  Since  some  words,  like  main  verbs,  lose 
their  streBB  in  some  constructions  (e.g.,  in  coordinate  constructions  and 
subordinate  phrases),  those  words  will  not  be  preceded  b7  Fo  boundaries  in 
such  positions. 

Boundaries  were  found  in  this  latest  study  to  occur  more  regularly  when 
unstressed  syllables  intervene  between  the  last  stress  of  the  previous 
phrase  and  the  first  stresB  of  the  following  phrase.  In  general,  in  pre- 
dicting where  Fo  boundaries  should  occur,  expected  stress  patterns  should  be 
taken  into  account,  so  that  boundaries  will  not  be  expected  in  coordinate 
structures  with  repeated  words,  or  in  some  subordinate  structures.  Even  the 
presence  or  absence  of  unstresses  between  stressed  syllables  could  be  used  to 
refine  the  probability  of  detecting  a phrase  boundary.  This  study  indicated 
that  boundaries  should  perhaps  be  expected  at  the  first  stress  after  the 
auxiliary  verb  in  a yes/no  question,  between  a noun  phrase  and  the  relative 
pronoun  of  its  subordinate  relative  clause , and  perhaps  even  between  an  adverb 
and  the  adjective  it  modifies  within  a noun  phrase. 


50 


Report  No.  PX  11963 


UNIVAC 


The  minimal  contrasts  In  structure  within  pairs  of  the  database  sentences 
have  been  useful  in  highlighting  these  refinements  in  boundary  prediction. 

In  summary,  it  appears  we  are  very  near  optimal  attainable  performance 
in  phrase  boundary  detection  from  Fo  contours,  with  few  possibilities  of 
improvement  by  revisions  in  the  computer  program.  For  some  ideas  about 
further  detailed  improvements  in  the  BOUND  3 phrase  boundary  detection  program, 
see  our  previous  semiannual  report  (Lea,  1976c).  However,  results  with  the 
255  sentences  suggest  that  the  places  where  boundaries  should  be  predicted 
could  deserve  further  study. 

In  addition  to  detection  of  phrase  boundaries  from  Fo  contours,  several 
other  studies  about  Fo  contours  have  been  conducted  at  Sperry  Univac.  In 
1973,  Lea  proposed  a model  of  Fo  contours  in  which  effects  due  to  clauses, 
phrasal  groupings,  stress  patterns,  and  phonetic  effects  were  superimposed. 

Our  recent  studies  of  Fo  contours  in  the  255  all-sonorant  sentences  (section 
3-5  of  this  report)  confirm  the  general  rapid-rise,  gradual-fall  intonation 
of  declarative,  WH,  and  command  clauses,  and  the  terminal  rise  that  sometimes 
but  not  always  accompanies  yes/no  questions.  The  peak  of  the  Fo  contour  in 
a clause  was  shown  to  occur  during  or  just  after  the  first  stressed  syllable 
in  the  clause.  Stressed  syllables  were  almost  always  exhibited  by  local  Fo 
rises  at  or  near  their  syllabic  onsets,  ~i  predicted  in  Lea's  model.  The 
effects  of  phonetic  sequercies  on  Fo  ce^ -ours  are  superimposed  on  the  clause, 
phrase, and  stress  effects,  and  are  removeable  by  use  of  all-sonorant  sequences. 
As  has  been  repeatedly  verified,  Fo  dips  slightly  during  voiced  obstruents, 
and  is  initially  high  after  unvoiced  obstruents  and  then  followed  by  a rapid 
fall. 


An  interesting  sidelight  to  our  studies  of  fundamental  frequency  contours 
in  the  159  sentences  (Lea,  1976c)  was  the  observation  that  large  fundamental  fre- 
quency variations  occurred  before  many  stressed  word- initial  vowels.  These  were 
obviously  the  result  of  glottal  stops . The  glottal  stop  is  often  preceded 
by  a local  rise  in  fundamental  frequency,  which  suggests  an  acoustic  (hence, 
universal  physiological)  origin  of  rising  tones  that  are  often  found  to 
precede  glottal  stops  in  tone  languages.  After  a glottal  stop,  a rapidly 
rising  fundamental  frequency,  or  other  major  perturbations  of  fundamental 


51 


Report  No.  PX  11 963 


UNI7AC 


frequency,  may  occur.  Unvoicing  may  be  apparent  during  the  glottal  stop. 
Obviously,  fundamental  frequency  variations  thus  may  be  indicative  of  the 
occurrences  of  glottal  stops,  so  that  they  may  be  distinguishable  from  oral 
stops. 

In  addition,  the  results  with  the  1^9  sentences  strongly  indicate  that 
glottal  stops  are  more  likely  to  occur  before  stressed  vowels  than  unstressed 
ones.  If  a glottal  stop  occurs,  it  very  probably  precedes  a stressed  vowel, 
and  is  often  likely  to  be  just  after  a major  constituent  boundary.  The 
glottal  stop  is  thus  another  potential  cue  to  stress  and  constituent  structure. 

In  another  study  of  Fo  contours,  Dean  Eloker  (197&),  showed  that  over 
8C$  of  xhe  perceived  phrase  boundaries  in  spectrally  inverted  speech  were 
detectable  by  an  Fo  model  that  automatically  locates  and  describes  the  shape 
of  Fo  patterns  throughout  a sentence.  He  used  the  function  y(t)  = at  e x to 
model  the  rise-fall  shapes  which  define  phrases,  with  parameters  a,  b,  and  c 
derived  from  a stepwise  regression  which  adds  new  values  to  the  region  of  a 
phrasal  contour  as  long  as  the  variance  of  the  fit  is  not  too  large  end  the 
next  Fo  value  is  within  a prediction  interval.  Twenty  one  percent  of  the 
boundaries  found  by  the  model  were  not  found  perceptually  (l.e.,  were  "false 
alarms").  Eloker  also  found  that  sentence-final  phrases  marked  as  complete 
clause  boundaries  by  listeners  were  generally  found  to  be  falling  or  level 
patterns,  while  those  heard  as  incomplete  were  all  found  to  be  rising  Fo 
contours . 

Our  study  of  Fo  contours  in  the  contrasting  structures  of  the  "flying 
planes  paradigm"  (page.  36)  showed  a clear  case  of  how  Fo  contours, 
and  the  boundaries  detected  from  them,  can  be  used  to  distinguish  between 
alternative  syntactic  bracketings  of  a sentence.  No  boundary  occurs  between 
the  adjective  and  final  noun  in  the  NP-C0PULATI7E-NP  structure,  while  a 
boundary  does  occur  between  the  verb  and  final  noun  of  ths  NP-AUX-V-N 
structure.  Many  other  structural  contrasts  can  be  possible  where  boundary 
detections  can  distinguish  among  alternative  structural  hypotheses. 

As  boundary  locations  are  more  precisely  predictable  and  we  know  more 
specifically  just  which  constituents  will  be  accompanied  by  boundaries, 


52 


Report  No.  PX  11 963 


UNIVAC 


boundary  detections  can  be  used  ever  more  effectively  in  hypothesizing 
syntactic  structures  given  the  prosodic  information.  Then,  such  prosodic 
adjustments  of  parsing  paths  as  was  attempted  for  the  BBN  HWIM  system  will 
be  possible  and  desirable. 

4.  5 Perceived  Stress  Patterns 

Before  we  could  evaluate  any  automatic  procedure  for  locating  stressed 
syllables,  we  needed  a "standard"  specifying  which  syllables  in  speech  are 
actually  stressed.  In  an  initial  study  in  1 973 > we  had  three  listeners 
individually  listen  to  portions  of  speech  tapes,  rewinding  at  will  and  listen- 
ing again  until  they  could  mark  each  syllable  as  either  stressed,  unstressed, 
or  reduced.  Texts  studied  were  the  Rainbow  Script  spoken  by  six  talkers, 
the  Monosyllabic  Script  spoken  by  two  talkers,  and  13  of  the  ARPA  man-computer 
interaction  sentences  (involving  eight  talkers).  Each  listener  repeated 
t the  perception  test  three  times,  with  trials  separated  by  several  days.  With 

three  repetitions  with  speech,  three  without  speech  (using  only  the  written 
text),  three  listeners,  and  with  the  various  speakers  involved,  this  study 
involved  a total  of  about  28,000  judgments  of  stresB  levels  for  syllables 
in  the  connected  texts. 

As  expected,  the  different  listeners  sometimes  assigned  different  stress 
levels  to  the  same  syllables,  presumably  based  on  hew  they  individually  defined 
the  boundaries  between  categories  of  stressed,  unstressed,  and  reduced  syllables. 
Their  confusions  were  not  seriously  increased  or  decreased  in  going  from 
individual  talker  to  talker,  or  from  text  to  text.  Two  listeners  were  found 
to  agree  in  their  perceived  stress  levels  for  most  of  the  individual  syllables. 
They  differed  on  only  about  % of  all  syllables  as  to  whether  they  were  stressed 
or  not,  and  each  of  them  showed  only  about  % confusions  in  decisions  about 
stressed  syllables  from  one  trial  to  another.  Unstressed  and  reduced  levels 
were  much  more  frequently  confused.  A third  listener  differed  from  the  other 
two  listeners  on  about  half  of  his  stress  level  judgments,  and  also  labelled 
substantial  percentages  of  all  syllables  as  stressed  on  one  trial  and  unstressed 
on  another.  Such  listeners  who  are  inconsistent  in  their  own  judgments  and 
who  differ  dramatically  from  other  listeners  should  be  excluded  in  any  attempts 
to  establish  standards  about  which  are  the  actual  "stressed  syllables"  in 
connected  speech. 


53 


Report  No.  PI  11963 


UNIVAC 


< 

The  listeners  also  appeared  to  be  as  consistent  In  their  assignments 
of  stress  levels  given  only  the  written  text  a6  they  were  in  their  assignments 
when  listening  to  the  speech  recordings ■ However,  their  judgments  without 
speech  did  not  correspond  well  with  their  judgments  with  speech  if  the  speech 
was  spontaneous  (that  is,  not  produced  by  speakers  reading  written  texts). 

Listeners  appeared  to  differ  most  dramatically  from  each  other,  and  yiold 
more  confusion  in  stress  levels  from  repetition  to  repetition,  when  yes-no 
questions  were  involved.  (Later  studies  with  more  questions  did  not  show  such 
a difference  due  to  sentence  type;  Lea,  1976c.) 

These  initial  studies  were  later  extended  to  all  the  31  ARPA  sentences 
(Lea,  Hedress,  and  Skinner,  19736)  with  similar  results.  Then,  in  1976,  with 
the  completion  of  the  design  and  recording  of  the  large  3300-sentence  speech 
data  base,  further  studies  were  conducted  on  listener's  perceptions  of  stress 
patterns,  both  for  enhancing  our  understanding  of  the  method  of  obtaining 
perceptions,  and  to  supply  stress  judgments  on  a variety  of  sentence  structures. 

These  stress  judgments  provide  the  'standard'  of  correct  stress  assignment 
by  which  acoustic  correlates  of  stress  can  be  evaluated,  and  also  provide 
considerable  evidence  (involving  over  17,000  perceptions)  about  the  stress  levels 
accompanying  various  word  categories,  and  the  effects  of  syntactic  processes 
(e.g.,  subordination  and  coordination)  on  stress  patterns. 

By  an  initial  experiment  in  which  eleven  listeners  provided  stress  percep- 
tions on  three  separate  trials  (spaced  one  week  apart),  we  demonstrated  that 
five  new  listeners  that  were  substantially  untrained  about  prosodic  structures 
could  successfully  (i.e.,  consistently,  and  with  agreement  among  listeners) 
categorize  all  syllables  in  connected  speech  as  either  stressed,  unstressed, 
or  reduced.  Good  listeners  may  be  selected  on  the  basis  of  consistency  from 
time  to  time  and  agreement  with  other  listeners.  Listeners  agree  that,  with 
a few  times  of  rewinding  and  listening  to  the  clauses  in  a sentence,  they 
can  effectively  and  meaningfully  mark  stress  patterns.  They  usually  listen 
first  for  stressed  syllables  throughout  clauses,  then  fill  in  decisions  about 
reduced  and  unstressed  syllables. 


54 


Report  No.  PX  11 963 


UNIVAC 


It  appears  from  these  studies  that  the  relative  stressedness  of  syllables 
can  be  reliably  determined  from  counting  the  number  of  listeners  that  agree 
that  a syllable  is  stressed  (or  reduced),  thus  yielding  a "stress  score"  which 
is  highest  for  the  most  stressed  syllables  and  lowest  for  the  most  reduced 
syllables.  Using  such  a stress  score,  we  have  demonstrated  that  VH-words, 
nouns,  quantifiers,  and  command  verbs  are  among  the  most-stressed  words  in 
English  sentences,  and  that  main  verbs,  adverbs,  adjectives,  and  negatives 
are  also  usually  stressed.  Auxiliary  verbs,  copulatives,  pronouns,  relative 
pronouns,  and  possessive  determiners  are  usually  unstressed,  while  articles, 
prepositions,  and  conjunctions  are  usually  reduced.  Coordination  produces 
significant  reductions  in  stress  levels  on  repeated  parts  (verbs  or  nouns), 
and  subordination  of  one  clause  or  phrase  under  another  causes  reduction  in 
stress  scores  on  verbs,  auxiliary  verbs,  and  conjunctions,  but  not  nouns. 

Another  structural  regularity  that  was  observed  was  that  perceived 
stresses  tend  to  decrease  throughout  a word  sequence  that  would  otherwise 
be  expected  to  have  equal  stresses,  or  a rising  stress  pattern;  stresses  on 
subject  nouns  are  higher  than  on  direct  object  nouns,  and  stresses  on  pronominal 
modifiers  (adverbs,  adjectives,  and  participles)  show  a descending  stress 
pattern,  not  the  expected  nuclear  stress  pattern. 

In  addition  to  providing  extensive  experimental  evidence  about  English 
3tress  patterns,  these  studies  have  provided  the  necessary  standard  of  correct 
stress  assignment  by  which  acoustic  correlates  of  stress  can  be  evaluated. 

4j6  Automatic  Location  of  Stresses 

In  1972,  we  first  proposed  a strategy  for  locating  stressed  syllables. 

Based  on  previous  studies  that  had  shown  that  local  increases  in  Fo  and 
large  integrals  of  energy  within  a syllable  are  the  most  reliable  acoustic 
correlates  of  stress,  this  algorithm  looked  for  regions  of  high  energy 
integral  near  local  Fo  increases.  The  increasing  Fo  near  the  beginning 
of  each  constituent  detected  by  the  boundary  detector  was  assumed  to  be 
attributable  to  the  first  stressed  syllable  in  the  constituent  (Lea,  1973b, 
section  5)*  A stressed  "HEAD"  to  the  constituent  was  thus  associated  with 
a portion  of  the  speech  which  is  high  in  energy  with  rising  Fo,  and  bounded 
by  substantial  (5  dB  or  more)  dips  in  energy.  Other  stressed  syllables  in  the 
constituent  were  expected  to  be  accompanied  by  local  increases  in  Fo.  Since 


55 


Report  No.  PI  11 963 


RNIVAC 


the  usual  ("archetype")  shape  of  the  Fo  contour  in  a constituent  is  a rapid 
rise  followed  by  a gradual  fall  in  Fo,  we  expected  that  local  ’increases' 
in  Fo  due  to  later  stressed  syllables  would  show  local  rises  above  the  grad- 
ually falling  Fo  contour,  even  if  Fo  did  not  rise  absolutely  near  the  stressed 
syllable.  The  stressed  syllable  is  located  within  a high-energy- integral 
region  near  this  local  rite  above  the  archetype  Fo  contour. 

This  strategy  was  first  specified  precisely  and  used  in  rigorous  hand 
analyses  but  not  implemented  as  a computer  program  until  1975*  in  the  interim, 
the  algorithm  was  tested  by  hand  analysis  of  Fo  and  energy  contours  for  400 
seconds  of  connected  speech;  namely,  the  Rainbow  Script  spoken  by  six  talkers, 
the  Monosyllabic  Script  spoken  by  two  talkers,  and,  later,  the  31  ARPA  test 
sentences.  The  algorithm  succeeded  in  locating  an  overall  average  of  around 
85!£  of  all  syllables  perceived  as  stressed  by  the  majority  votes  of  a 
panel  of  listeners.  Performance  was  best  with  speech  read  from  written  texts, 
but  even  in  the  31  ARPA  man-computer  interaction  sentences,  over  83%  of  the 
perceived  stresses  were  found.  About  20 % of  all  algorithmically  located 
"stresses"  were  false,  in  that  they  did  not  point  to  syllables  perceived  as 
stressed  by  a majority  of  the  listeners. 

It  was  conceivable  that  Bimpler  procedures  for  stress  location  might  work 
as  well  as  the  archetype  contour  algorithm  ve  had  developed.  Consequently,  in 
1 973 » Lea  (1973P)  did  a comparison  of  three  approaches  to  stressed  syllable 
location.  Methods  based  on  only  the  durations  of  iv'gh  energy  chunks,  or  upon 
only  the  length  of  time  that  fundamental  frequency  (Fo)  was  not  falling  signifi- 
cantly, did  not  perform  as  well  as  the  original  algorithm  based  on  archetype 
Fo  contours  in  phrases  and  local  searches  for  high-energy  chunks  of  speech. 

The  archetype  contour  algorithm  was  also  least  sensitive  to  the  type  of 
sentence  being  processed,  while  the  other  algorithms  showed  quite  different 
performance  in  yes/no  questions. 

In  March,  1 975  > the  archetype  algorithm  was  implemented  as  a FORTRAN  pro- 
gram ("STRESS")  and  distributed  to  SUR  contractors.  This  repi3sented  a major 
milestone  in  Sperry  Univac's  efforts  to  provide  prosodic  aids  to  speech  under- 
standing. The  implementation  included  a number  of  improvements  and  new  tests 
not  included  in  the  original  algorithm,  including  refined  methods  for  selecting 
the  highest-energy  nucleus  near  a rise  in  Fo,  a method  of  picking  up  some 
stresses  that  were  missed  in  very  long  phrases,  and  a special  test  for  pre- 
pausal  stresses.  The  program  was  tested  with  the  Rainbow  and  Monosyllabic 

56 


Report  No.  FX  11 963 


UNIVAC 


Scripts,  spoken  by  two  talkers,  and  the  31  ARPA  sentences.  On  the  average,  89^ 
of  the  syllables  perceived  as  stressed  were  found  by  the  program,  while  about 
one  out  of  five  locations  were  false.  The  STRESS  program  confused  about  1 5% 
of  all  syllables  between  the  stressed  and  unstressed  categories,  while  listeners 
confused  about  5%  of  the  syllables.  While  the  program  is  obviously  open  to  some 
improvements,  it  is  approaching  *he  level  of  performance  that  listeners  can  attain. 

In  our  most  recent  study  of  syllabification  and  automatic  stress  location 
(see  section  3-4),  we  found  that  $1%  of  all  the  syllables  in  the  255  database 
sentences  were  correctly  detected  by  the  syllabification  routine,  while  some 
weak  syllables  in  all-sonorant  sequences  were  missed.  These  results  were 
surprisingly  good,  considering  the  difficulty  of  syllabification  in  all- 
sonorant  sequences.  The  overall  scores  of  9 2 $ correct  stressed  syllable  location 
and  20^  false  alarms  were  also  very  satisfying.  There  is,  however,  3till 
room  for  considerable  improvement.  One  3uch  improvement  is  expected  to  come 
from  a revision  of  the  STRESS  program  to  associate  the  first  stress  in  a 
sentence  with  the  nucleus  immediately  preceding  the  peak  Fo  value  in  the 
sentence.  Another  improvement  would  be  a revision  of  the  utterance-final 
(prepausal)  test  for  stresses.  Other  possible  improvements  were  discussed  in 
section  3-4. 

Steps  should  be  taken  in  future  studies  to  use  the  located  stresses  to 
guide  phonemic  analysis,  word  matching,  and  syntactic  parsing  processes. 

4.7  Timing , Cu<?a-ta 

In  1974,  we  conducted  a study  of  rhvthm  and  timing  cues  in  our  available 
speech  texts  (Lea,  1974a).  These  studies  suggested  that  stressed  syllables 
tend  to  occur  at  intervals  of  about  0.4  to  0.5  seconds  (that  is,  there  is  some 
tendency  toward  stress  "isochrony") . However,  the  variation  in  interstress 
interval  sizes  was  quite  large,  even  for  a single  talker  within  a single  text. 

We  concluded  that  the  concept  of  English  being  a stress-timed  language  is  not 
simply  exhibited  by  exact  equality  ox'  interstress  intervals,  or  even  by  an 
unquestionable  "tendency  toward  equality"  of  interstress  intervals  regardless 
of  other  factors.  We  found  that,  contrary  to  several  published  hypotheses, 
the  average  interstress  interval  increases  about  linearly  with  the  number  of 
unstressed  syllables  between  the  stresses.  A tendency  toward  stressed- unstressed 
alternation  was  exhibited,  and  it  is  probably  this  tendency,  plus  the  somewhat 
uniform  durations  expected  for  unstressed  syllables,  that  yields  the  tendency  for 
interstress  intervals  to  cluster  somewhat  near  an  average  of  0.4  seconds  or  so. 

57 


Report  No.  PI  96  3 


TJNT7AC 


This  study  also  showed  that  pauses  between  clauses  of  a sentence  tended  to 
be  about  the  same  duration  as  interstress  intervals,  while  pauses  between 
sentences  tended  to  be  twice  that  duration.  A pause  is  thus  like  an  integer 
multiple  of  an  inserted  silent  interstress  interval.  We  also  found  that  time 
intervals  between  detected  syntactic  boundaries  tended  to  cluster  in  a multi- 
modal distribution  centered  around  multiples  of  the  average  interstreBS  interval. 

These  results  indicate  that  interstress  intervals,  pause  durations,  and 
intervals  between  detected  boundaries  all  seem  to  relate  to  speech  rhythm.  Each 
of  these,  plus  a measure  like  the  number  of  syllables  per  second,  may  be  useful 
as  a measure  of  the  rate  of  speech.  Information  about  rate  of  speech  may  bo 
used  in  selecting  the  appropriate  phonological  rules  to  apply  in  determining 
underlying  phonemic  structure  from  the  slurred,  coarticulated  phonetic 
sequences.  "Fast  speech"  rules  show  more  slurring,  co articulating,  and 
dropping  of  speech  sounds. 

We  experimentally  investigated  how  various  measures  of  the  rate  of  speech 
correspond  with  changes  in  phonological  structure  that  should  be  handled  by 
"fast  speech"  phonological  and  acou: tic  phonetic  rules.  The  duration  of  the 
interstress  interval  was  found  to  inversely  correlate  with  the  percentage  of 
phones  that  were  erroneously  categorised  by  various  available  methods  for 
automatic  phonetic  categorization.  Other  measures  of  speech  rate,  such  as 
the  number  of  syllables  per  unit  time,  were  not  as  closely  correlated  with 
phonetic  error  rates.  The  interstress  interval  thus  appears  to  be  useful  in 
predicting  phonological  males  that  might  apply  to  an  utterance. 

In  addition  to  such  phonological  use  of  rate  of  speech,  specific  rhythmic 
effects  such  as  interruptions  of  rhythm  (pauses,  "disjunctures",  etc.)  could 
be  useful  in  hypothesizing  the  grammatical  structure  of  a sentence. 

Tvo  experiments  were  conducted  that  verified  the  occurrence  of  timing 
cues  to  grammatical  structure.  In  one  experiment,  five  sentences  per  speaker 
were  selected  from  the  speech  of  six  individuals  who  participated  in  simulations 
of  computer  interactions.  The  utterances  were  distorted  by  spectral  inversion 
and  presented  to  five  listeners  who  marked  stressed  syllables,  and  the  locations 
and  types  (normal  or  hesitation)  of  phonological  phrase  boundaries,  using  only 
the  prosodic  cues  remaining  in  the  signal.  Vowel  and  sonorant  durations  (with 
and  without  aspiration)  were  measured  from  spectrograms,  and  then  declared 


58 


Report  No.  PX  11 963 


UNIVAC 


stressed  or  unstressed  based  on  the  perceptions.  Exploring  the  hypothesis  that 
large  increases  in  phonetic  duration  are  syntactically  determined,  perceived 
boundary  locations  were  compared  with  preceding  segments  which  were  20 % above 
the  median  length  for  that  segment  type.  Using  a rule  which  groups  lengthened 
syllables,  and  from  the  lengthened  group  predicts  phrase  boundaries,  of 
the  perceived  boundaries  were  predicted.  Of  all  the  perceived  phrase  boundaries, 
those  before  silences  longer  them  200  milliseconds  were  more  reliably  predicted 
by  lengthening  than  boundaries  not  at  long  silences.  Locations  perceived  to  be 
normal  phonological  phra3e  boundaries  were  more  reliably  predicted  than  those 
perceived  as  hesitations.  Of  the  predicted  boundary  locations  not  perceived 
by  listeners,  some  marked  major  syntactic  boundaries,  but  most  were  at  minor 
syntactic  breaks,  notably  between  modifiers  and  nouns,  and  after  prepositions. 

The  results  also  suggested  that  speaker  differences  and  style  variations  may  be 
important . 

In  another  experiment,  the  question  was  whether  or  not  one  could  detect 
major  phrase  boundaries  from  timing  of  prosodic  features  alone  (such  as  onsets 
of  syllabic  nuclei  'ound  from  energy  contours),  without  the  need  for  a prior 
determination  of  the  phonetic  sequence  or  the  detection  of  lengthening  of  phonetic 
segments.  We  have  already  noted  that  syntactically-dictated  pauses  appeared 
as  one-or  two-unit  interruptions  of  rhythm.  Inters tress  intervals  spanning 
those  pauses  were  thus  two  or  three  times  their  average  duration  within  clauses. 

In  addition  long  disjunctures  (l.e.,  interstress  intervals  greater  than  0.5 
seconds)  accompanied  95^  of  the  perceived  boundaries  between  phonological 
phrases.  We  thus  do  have  quite  reliable  cues  to  linguistic  structure  in  the 
timing  of  speech  events. 

Carefully  Designed  Speech  Databases 

From  our  initial  "natural  experiments"  with  the  Rainbow,  Monosyllabic, 
and  31  ARPA  speechtexts,  we  were  able  to  get  initial  evidence  suggesting 
specific  relationships  between  various  acoustic  prosodic  features,  on  the 
one  hand,  and  linguistic  structures,  perceptions,  and  abstract  notions  (such 
aB  boundaries,  phrase  structures,  and  rhythm,  etc.),  on  the  other  hand. 

However,  one  cannot  be  certain  with  such  natural  experiments  that  some  unknown 
third  variable  is  not  the  source  of  any  apparent  relationships  between  the 
acoustic  variable  and  the  uncontrolled  underlying  abstract  variable.  Controlled 
experiments,  with  all  variables  except  one  fixed  in  the  comparison  of  two 


59 


Ileport  No.  PX  11 963 


UNIVAC 


utterances,  provide  the  proper  extension  from  the  encouraging  results  of  the 
natural  experiments.  We  consequently  undertook  the  design  of  speech  texts 
to  provide  the  necessary  controls  and  sufficient  data  to  extend  these  encour- 
aging tendencies  into  well-defined  rules  relating  prosodic  variables  and 
linguistic  structure.  As  noted  in  the  previous  sections,  controlled  tests 
with  these  sentences  have  contributed  substantially  to  our  understanding  of 
prosodic  structures. 

There  is  a definite  need  to  develop  precise  rules  for  systematically 
relating  prosodic  patterns  to  underlying  structures.  If  one  can  understand 
how  the  interacting  effects  of  semantics,  syntax,  lexical  structures,  BtresB 
patterns,  and  phonetic  sequences  are  superimposed  in  the  Fo  and  energy 
contours  and  time  patterns  of  controlled  English  sentences,  he  has  some 
of  the  most  essential  tools  for  using  acoustic  prosodic  data  to  guide  Bpeech 
understanding  strategies.  Only  by  a systematic  attack  on  the  task  of  com- 
piling experimentally-verified  rules  can  one  hope  to  provide  the  kind  of 
reliability  needed  to  make  such  prosodic  data  of  major  value  in  speech 
understanding  systems.  For  example,  our  earlier  predictions  of  where  phrase 
boundaries  should  occur  in  Fo  contours  were  based  on  intuitive  analyses  of 
syntactic  structures.  Where  expected  boundaries  did  not  occur,  or  wherever 
false  or  unexpected  boundaries  occurred,  there  had  been  no  recourse  indicating 
the  source  of  the  error.  This  was  ir  part  due  to  the  intuitive  predictions 
used,  and  in  part  due  to  the  uncontrolled  syntactic  structures  involved  in  the 
texts  studied.  Experiments  with  the  designed  sentences  of  known  syntactic 
structure  have  already  begun  to  indicate  exactly  what  structural  boundaries 
are  marked,  and  will  ultimately  permit  the  writing  of  precise  rules  predicting 
where  boundaries  will  occur  in  new  sentences  of  similar  structures.  These 
rules  for  predicting  detectable  boundaries  may  then  be  useful  in  computer 
determination  of  possible  underlying  structure  given  the  detected  boundaries. 
Similarly,  precise  rules  for  relating  streBE  patterns  to  underlying  structures 
are  needed. 

Of  primary  importance  in  such  prosodic  studies  is  the  development  of  English 
intonation  rules.  Intonation  rules  cut  across  the  whole  gamut  of  problems 
involved  in  speech  understanding,  including  the  explanation  >hy  constituent 
boundaries  are  detectable  in  Fo  contours,  what  are  the  acoustic  correlates  of 
stressed  syllables,  how  syntactic  and  semantic  structures  might  be  manifested 


60 


Report  No.  PC  11 963 


UNIVAC 


acoustically,  what  are  useful  phonological  rules  and  morphological  rules  in 
various  contexts,  and  how  stress  rules  might  be  inferred  from  acoustic  data. 

Many  miles  and  hypotheses  about  regular  prosodic  patterns  have  been  published, 
but  few  have  been  tested  with  extensive  speech  data.  We  consequently  designed 
an  extensive  set  of  922  sentences  which  provide  "minimal  pairs"  of  sentences 
with  nearly  identical  word  sequences  but  c.utiasting  structures.  These 
sentences  include  explicit  tests  of  the  prosodic  effects  of  sentence  type, 
contrastive  syntactic  bracketing,  subordination,  coordination,  syntactic 
categories  (such  as  pronouns,  veroals,  compound  nouns.-  etc.),  movement  of 
stress  within  phrases,  coreference,  etc.  Prosodic  patterns  that  can  be  studied 
with  these  sentences  include:  performance  of  the  program  for  detecting  phrase 
boundaries  from  valleys  in  Fo  contours;  acoustic  correlates  of  stressed 
syll les,  and  performance  in  automatic  stressed  syllable  location;  acoustic 
measures  of  rhythm  and  rate  of  speech;  cverall  Fo  contour  shapes;  and  local 
variations  in  prosodic  features  due  to  *._onetic  sequences.  Also  designed  was 
a set  of  178  sentences  which  included  all  word- initial  consonant-vowel  (CV) 
sequences  and  all  word-final  vowel-consonant  (VC)  sequences.  These  "phonetic- 
sequence  sentences"  provide  the  speech  data  needed  for  efficiently  testing 
automatic  procedures  for  vowel  and  consonant  classification.  For  example,  five 
sentences  provide  instances  of  all  distinguishable  stressed  vowels  of  American 
English,  coupled  with  the  sibilants  (s,J),  in  initial  CV  and  final  VC  positions. 

The  1100  designed  sentences  were  recorded  in  a pseudorandom  order  by 
three  male  talkers,  using  unusual  recording  procedures  that  involved  projecting 
the  sentences  one  at  a time  on  the  wall  of  a sound  proof  room  in  which  the 
talker  was  situated.  Complete  dialect  information  was  obtained  for  the  three 
talkers.  Subsets  of  the  sentences  (including  99  by  one  talker,  37  by  another, 
and  255  by  the  third)  were  duobed  into  a useful  order  for  subsequent  prosodic 
analysis. 

From  extensive  studies  with  such  designed  sentences,  one  could  hopefully 
develop  experimentally-validated  intonation  rules  and  other  prosodic  rules. 

These  rules  would  then  be  used  to  guide  parsing,  semantic  .alyais,  phonological 
analyses,  and  word  matching  procedures  in  future  speech  understanding  systems. 


61 


Report  No.  PX  11 963 


UNIVAC 


Only  a modest  beginning  on  such  studies  has  been  completed  within  our  AEPA  program, 
but  the  data  is  available  for  further  studies.  In  addition,  our  experience  with 
the  unusually  monotonic  speech  of  two  of  the  talkers  suggests  that  the 
sentences  ought  to  be  recorded  by  other  talkers. 

A report  about  the  designed  sentences  and  many  prosodic  hypotheses  that 
they  can  he  used  to  test  has  just  been  published  and  should  be  of  service  in  an; 
future  studies  with  the  large  speech  database.  I would  like  to  reiterate  here 
that  the  design  of  such  an  extensive  set  of  sentences  with  minimally-distinguished 
sentence  structures  is  a very  valuable  result.  To  design  and  record  such 
large  volumes  of  speech,  devise  hypotheses  to  test,  and  arrange  the  data  into 
subsets  for  analysis  is  a major  task  which  spanned  almost  three  years  at  Sperry 
Onivac.  Other  researchers  might  avoid  duplicate  effort  by  adopting  some  of  the 
sentence  structures,  and  perhaps  even  the  speech  recordings,  for  their  studies. 


62 


Report  No.  PX  11 963 


UNIVAC 


5.  CONCLUSIONS  AND  FURTHER  STUDIES 

Table  VII  (pages  4l  to  43)  lists  the  many  specific  accomplishments  of 
Sperry  Univac 1 s work  on  Prosodic  Aids  to  Speech  Recognition  for  ARPA.  We 
need  not  summarize  such  specifics  again  here.  Father,  we  shall  consider  the 
general  conclusions  from  this  work,  and  final  suggestions  for  further  studies. 

5 . 1 Conclusions 

4 

Prosodic  information  can  and  should  be  used  to  provide:  reliable  anchor 
points  for  efficient  and  accurate  phonetic  analysis;  guidelines  for  phonological 
rule  applications;  segmentation  of  sentences  into  phrases;  indications  of 
syntactic  features  like  sentence  type,  subordination,  and  coordination;  and 
cues  to  semantic  relations.  We  have  shown  not  only  why  such  use  of  prosodies 
is  important,  but,  to  some  degree,  how  prosodies  can  be  incorporated  into 
speech  understanding  systems.  Specific  computer  programs  are  now  available 
for  obtaining  Fo  contours,  locating  syllabic  nuclei  and  syllable  boundaries, 
determining  which  syllables  are  stressed,  and  segmenting  speech  into  phrases. 

A general  strategy  for  prosodic ally  guided  speech  understanding  would: 
segment  speech  into  phrases;  locate  the  stressed  syllables;  do  a phonetic 
analysis  anchored  around  the  reliable  stressed  syllables  and  other  islands 
'■'f  phonetic  reliability;  hypothesize  words  that  match  that  phonetic  structure; 
postulate  syntactic  structures  that  match  the  prosodic  patterns  of  phrase 
boundaries,  stress  patterns,  timing,  and  intonation;  hypothesize  phrases  or 
word  sequences  that  match  the  prosodic,  segmental,  and  lexical  information: 
and  verify  semantic  and  pragmatic  conditions.  No  such  system  has  been  developed, 
and  none  of  the  ARPA  systems  come  close  to  using  prosodies  to  the  degree  we 
have  recommended.  However,  our  initial  cooperative  effort  with  BBN,  to 
incorporate  international  phrase  boundaries  into  the  parsing  procedures  of 
the  BBN  HWIM  system,  was  giving  encouraging  results  as  the  ARPA/SUR  program 
came  to  a close.  Also,  syllabification,  Fo  tracking,  and  even  rudimentary 
aspects  of  phrase  boundary  detection  were  incorporated  into  the  systems. 

Five  years  ago  almost  no  mention  was  made  of  the  role  of  prosodies  in 
speech  understanding  systems.  Prosodies  were  not  listed  among  the  major 
knowledge  sources  or  "levels"  of  system  organization  in  the  original  report 
of  the  ARPA  study  group  that  defined  the  ARPA/SUR  program.  One  would  have  to 
look  long  and  in  a variety  of  directions  to  hear  or  see  even  any  "lip 

63 


Report  No.  PX  11 963 


UNIVAC 


service"  given  to  the  possibility  that  prosodies  should  play  any  role  in 
speech  recognition."'  Indeed,  prosodies  wa6  at  an  infant  stage  comparable 
to  that  which  phonemic  structures  had  in  the  word  recognition  efforts  of  the  early 
1950's.  Prosodies  had  less  acceptability  in  1971  than  syntax  enjoyed  in  Lindren's 
1965  survey  of  work  on  speech  recognition.  Now.  prosodies  have  reached  such 
a level  of  acceptability  and  attention  that  one  rarely  hears  a general 
prediction  of  future  work  without  a substantial  (though  not  always  a well- 
informed)  acknowledgement  of  the  need  for  a prosodic  knowledge  source  in  the 
system.  In  a review  of  one  of  my  recent  papers,  the  reviewer  put  the 
question  of  whether  to  use  prosody  as  now  a foregone  conclusion,  and 
considered  that  now  it  is  a question  of  how  to  use  it.  To  a long-term 
advocate  of  such  a "weak-sister"  in  the  array  of  speech  processing  tools 
this  i6  a heartening  accomplishment. 

Sperry  Dni vac's  efforts  in  the  ARPA  program  began  as  basic  supportive 
research  on  prosodies  and  their  relationships  to  speech  understanding  systems. 
Only  at  a late  stage  in  the  program  did  the  pressures  for  successful  systems 
and  closer  cooperation  among  contractors  lead  us  into  concerted  efforts  to 
incorporate  our  ideas  and  experimental  results  into  the  systems  being 
developed  by  other  ARPA  contractors.  In  retrospect,  if  such  practical 
application  of  prosodies  within  systems  was  to  be  accomplished  primarily  by 
us,  the  researchers  on  prosodlca  rather  than  by  the  system  builders,  such 
an  orientation  should  have  been  taken  earlier.  We  didn't  quite  make  our 
experiments  be  of  practical  application  until  shortly  before  the  systems  had 
to  be  frozen  for  performance  evaluation. 

Consequently,  our  biggest  accomplishments  were  in  the  area  of  experi- 
mental studies  of  prosodic  structures.  This  is  evidenced  by  the  long  list 
of  experimental  results  listed  in  Table  VII  (pages  4l  to  43) . We  provided 
solid  experimental  evidence  for  what  were  intuitively  accepted  notions  about 
the  value  of  prosodies;  namely,  that  any  of  various  available  methods  of 
automatic  labelling  of  phonetic  segments  worked  best  in  the  carefully  arti- 
culated stressed  syllables,  that  intonation  provides  cue6  to  phrase  structure, 
that  stress  patterns  can  be  used  to  directly  detect  some  aspects  of  syntactic 

1 A notable  exception  was  the  work  of  the  late  Gordon  Peterson  (1961,  1963) 

perhaps  the  earliest  spokesman  for  the  use  of  prosodies  in  speech  recognition. 


64 


Report  No.  PX  11 963 


ONIVAC 


structure,  and  that  stresses  are  crucial  to  the  rhythm,  rate,  and  prediction 
of  phonological  distortions  m speech. 

We  experimentally  confirmed  or  disproved  linguistis'  and  theorists' 
rules  about  expected  stress  patterns,  intonation  contours,  pauses,  rhythms, 
and  perceptions  of  prosodies.  Some  of  our  results  have  major  theoretical 
significance  to  linguists  and  speech  scientists.  For  example,  contrary 
to  Bolingers'  published  claims  ( 1 9^5 i 1972),  there  is  a neutral,  syntactically- 
determined  intonation  contour  and  stress  pattern  for  spoken  English  sentences. 
Contrary  to  the  claims  of  Pike  (19^5)  and  other  linguists,  isochrony  of 
English  stresses  is  not  exhibited  by  simple  squeezing  of  unstresses  between 
fairly  fixed  onset  times  of  stressed  syllables.  Some  aspects  of  claimed 
''nuclear  stress  patterns"  (cf.  Chomsky  and  Halle,  1 968)  are  not  confirmed  by 
either  perceived  stress  patterns  or  acoustic  correlates.  Despite  criticisms 
(Armstrong  and  Ward,  1929;  Lieberman,  1967)  of  the  close  association  assumed 
between  constituent  structures  and  invariant  prosodic  signals,  there  is 
considerable  evidence  that  major  syntactic  boundaries  are  reliably  marked  by 
pauses,  Fo  contours,  phrase-final  lengthening  of  vowels  and  sonoranfs,  longer 
interstress  intervals,  and  even  specific  phonetic  segments  like  glottal  stops. 

A major  challenge  to  total  language  models  is  the  distinction  between  the 
positions  of  surface  syntactic  boundaries  and  the  displaced  indicators  of 
those  boundaries  in  Fo  contours  and  groups  of  lengthened  syllables.  Similarly, 
though  the  subject-predicate  boundary  is  considered  among  the  major  syntactic 
breaks  in  a sentence  (cf.  e.g.  Scholes,  1971)>  that  boundary  is  one  of  the 
least  detectable  from  the  prosodic  patterns  we  have  studied. 

Our  research  has  spanned  the  whole  gamut  of  prosodic  structures,  and  I 
believe  it  provides  vital  background  for  further  work  on  prosodic  aids  to 
speech  recognition.  We  are  very  close  to  where  prosodies  can  be  used  to 
provide  valuable  aids  to  phonological  analysis,  word  matching,  and  parsing. 
Indeed,  if  a system  builder  cannot  now  accept  the  need  for  a total  prosodically 
guided  speech  understanding  strategy  such  as  we  have  proposed,  he  should  at 
least  give  careful  consideration  to  incorporating  a "prosodic  verifier" 
which  compares  expected  prosodic  7 ttems  for  hypothesized  word  sequences  with 
actual  detected  patterns,  and  thus  adjusts  scores  of  alternative  hypotheses. 
Such  ideas  are  being  explored  by  Sperry  Univac  under  internal  funding. 


65 


Report  No.  PZ  11 963 


UHIVAC 


Many  of  our  most  solid  results  about  prosodic  structures  have  come  from 
the  recent  use  of  our  database  of  sentences  with  minimal  pairs  of  contrasting 
structures.  Such  designed  sentences  should  play  a valuable  role  in  any 
prosodies  research  or  development  of  proBodic  aids  to  speech  under standing. 

5.2  Further  .States  g 

If  I had  my  unrestricted  choice  and  the  necessary  resources  to  undertake 
a program  in  prosodic  aids  to  speech  recognition  today,  I would  do  the  following, 
and  I obviously  recommend  this  approach  to  interested  researchers.  I would 
have  a two-prong  effort:  (1)  conducting  necessary  experimental  research  on 
prosodic  regularities;  and  (2)  developing  a nrednminentlv  prosodically-guided 
speech  understanding  system  which  can  progressively  incorporate  more  prosodic 
information. 

We  do  not  know  all  we  need  to  know  to  be  able  to  simply  apply  prosodies 
to  speech  understanding  without  simultaneous  further  research.  Anyone  who 
would  promote  prosodic  aids  without  further  experimentation  would  be  in 
danger  of  slowing  the  ultimate  progress  of  prosodically  guided  systems  by  a 
premature  application  of  limited  information.  Such  a tactic  may  even  lead 
to  discouragement  about  prosodies,  resulting  from  ill-devised  and  improperly 
applied  limited  tools.  We  need  to  know  more  precisely:  just  which  constituents 
are  demarcated  by  Fo  contours  and  other  prosodic  cues;  what  intonation  can  tell, 
us  about  sentence  type,  subordination,  coordination,  and  special  phrase  structures, 
how  to  remove  or  handle  phonetic  influences  on  prosodies;  what  stress  patterns  can 
actually  be  expected  with  various  phrase  structures;  how  to  use  timing  cues  to 
select  phonological  rules;  etc.  We  also  need  to  test  prosodic  regularities  with 
more  talkers,  other  speech  styles,  and  more  repetitions  per  talker.  In  general, 
useful  rules  for  relating  prosodic  patterns  to  linguistic  structures  must  be 
experimentally  developed . 

On  the  other  hand,  the  development  of  useful  rules  for  relating  prosodic 
patterns  to  linguistic  structure  also  demands  the  direct  application  of  those 
rules  to  working  systems,  to  evaluate  their  accuracy  and  utility.  I would  recom- 
mend a system  structure  that  makes  use  of  prosodies  from  the  very  beginning  of 
system  implementation.  Two  alternative  beginnings  are  (A)  a prosodically 
guided  speech  understanding  system  such  as  we  have  previously  defined  (Lea, 

197^;  Lea,  Medress  & Skinner,  1975);  or  (B)  a more  standard  system  with 
acoustic  analysis,  phonetic  segmentation,  word  matching  and  scoring,  and 
appropriate  parsing  and  control  structures  that  hypothesize  and  test  word 
sequences,  but  with  a "prosodic  verifier".  The  prosodic  verifier  would 

66 


Report  No.  PX  11 963 


UNIVAC 


compare  expected  stress  patterns  with  detected  stress  patterns  to  adjust  word 
scores,  compare  expected  and  Fo-detected  phrase  boundaries  to  adjust  scores 
on  word  hypotheses,  compare  prosodic  indicators  of  sentence  type  with  the 
hypothesized  type  of  sentence,  etc.  Either  system  could  use  stressed  syllables 
as  phonetically-reliable  anchors  around  which  a search  for  occurrences  of 
words  can  be  attempted,  and,  if  convenient,  they  could  restrict  expensive 
acoustic  analyses  such  as  LPC  spectral  analysis  to  only  those  regions  (voiced 
regions  or  maybe  only  stressed  syllables)  where  prosodies  could  suggest  that 
such  analysis  is  needed. 

The  system  should  be  used  initially  for  very  restricted  tasks  with 
few  syntactic  structures,  comparable  to  or  more  restrictec  than  those  used 
in  the  successful  HARPY  and  HEARSAY  systems.  Later  work  could  deal  with 
more  challenging  tasks  such  as  the  versatile  subset  of  English  handled  within 
the  BBN  HWIH  system. 

We  need  to  define  precise  ways  of  using  prosodies  in  word  matching  and 
parsing.  Can  one  reliably  rule  out  words  from  hypothesized  occurrence  at 
a certain  point  in  an  utterance,  based  on  the  wrong  syllables  being  stressed 
or  phrase  boundaries  occurring  where  they  are  not  expected?  Can  one  rule  out 
(or  reduce  the  score  onj  possible  phrase  structures  because  the  phrase 
boundaries  that  were  detected  are  at  radically  different  places  from  those 
predicted  for  those  structures?  Even  for  very  restricted  speech  recognition 
systems  these  ideas  would  seem  worth  incorporating  and  testing. 

On  the  experimental  side,  I would  continue  testing  the  B0UND3  Fo-boundary 
detector,  the  syllabification  routine,  and  the  STRESS  stressed  syllable  locator, 
using  the  remainder  of  our  922  "Prososyntactic  Sentences"  spoken  by  talker 
WAL,  and,  very  soon,  introduce  other  talkers  for  the  same  sentences.  Later 
I would  introduce  repetitions  of  the  same  sentence  by  the  same  talker.  Other 
syntactically  and  prosodically  informative  sentences  might  then  be  added. 

After  some  concrete  results  with  several  talkers,  I would  explore  similar 
questions  with  other  speech  styles. 

Early  attention  should  be  directed  toward  the  following  questions: 

Which  constituents  are  marked  by  prosodic  boundaries?;  What  is  the  success 
in  boundary  detection,  syllabification,  and  stressed  syllable  location?; 

What  regularities  are  to  be  found  in  initial,  terminal,  and  medial  Fo  contours 


67 


Report  No.  PI  11963 


UNIVAC 


within  clauses?  Can  one  find  more  adequate  procedures  for  extracting 
syntactic  and  stress-related  aspects  from  phonetic  influences  on  Fo  contours? 
From  the  beginning,  one  should  see  a primary  goal  of  developing  experimentally- 
verified  intonation  rules.  About  the  time  that  more  talkers  are  introduced, 

I would  recommend  thorough  studies  of  all  acoustic  correlates  of  stress, 
with  the  thought  of  improving  or  replacing  the  current  stress  location  pro- 
gram. Studies  of  rhythm,  rate,  and  the  use  of  interstress  intervals  to 
predict  applicable  phonological  rules  should  be  under taken  by  the  time  that 
significant  data  i6  available  from  several  talkers. 

In  essence,  I am  saying  that  we  are  in  the  middle  of  the  necessary 
experimental  research  about  prosodic  structures,  with  considerable  work  yet 
to  be  done,  but  with  the  possibility  of  promptly  beginning  to  apply  restricted 
prosodic  information  within  a speech  understanding  system.  While  the  ARP A/ 

STJR  program  is  over,  and  with  it  the  excellent  interactions,  cooperative 
spirit,  and  interchange  of  ideas  that  has  so  much  permeated  that  progrem, 
the  need  for  speech  understanding  systems  and  for  prosodic  guidelines  will 
not  diminish,  but  rather  increase.  Speech  understanding  remains  one  of  the 
most  challenging  potential  users  of  prosodies,  though,  as  I have  noted 
previously  (Lea,  1976c,  pp.  49-50),  prosodies  can  also  be  used  in  other 
systems  for  word  and  concept  spotting,  language  identification,  speaker 
identification,  and  speech  synthesis. 


68 


Report  No.  PX  11 963 


UNIVAC 


6.  REFERENCES 

ALLEN.  J.  and  0"SHAUNESSX,  D.  (1975)  . "Fundamental  Frequency  Contours  of 
Auxiliary  Phrases  in  English,"  MIT/RLE  Technical  Report,  Massachusetts 
Institute  of  Technology,  Cambridge. 

ANDERSON,  B.  F.  (19o6).  The  Psychology  Experiment.  Belmont,  California, 
Brooks/Cole  Publishing  Company. 

ARMSTRONG,  L.E.  and  WARD,  I.C.  (1926)  . Handbook  of  English  Intonation. 
Cambridge;  Heffer  (2nd  Edit.). 

BOLINGER,  D.  L.  (1965).  Forms  of  English:  Accent, Morpheme.  Order.  Cambridge: 

Harvard  Univ.  Press. 

BOLINGER,  D.  L.  (1972)  . "Accent  is  Predictable  (if  Your're  a Mind  Reader)," 
Language,  vol.  48,  633-644. 

CHEUNG,  J.  Y.  (1975).  "Computer  Estimates  and  Modelling  of  Linguistic 
Stress  Patterns  in  Speech",  Ph.D.  Dissertation  (EE  Tech.  Report  No.  188), 

Dept,  of  Electrical  Engineering,  Univ.  of  Washington,  Seattle. 

CHOMSKY,  N.  and  HALLE,  M.  (1968),  The  Sound  Pattern  of  English.  New  York: 

Harper  and  Row. 

KLOKER,  D.  R.  (1975)-  "Vowel  and  Sonorant  Lengthening  as  Cues  to  Phonological 
Phrase  Boundaries , " presented  at  the  89th  Meeting,  Acoustical  Society  of 
America,  Austin,  Texas,  April  8-11.  Abstract  in  J.  Acou3t.  Soc.  America, 
vol.  57 , Supp. 

LEA,  W.  A.  (1972).  "Intonational  Cues  to  the  Constituent  Structure  and 
Phonemics  of  Spoken  English,"  Ph.D.  Dissertation,  School  of  Electrical 
Engineering,  Purdue  University. 

LEA,  W.  A.  (1973&)*  "Syntactic  Boundaries  and  Stress  Patterns  in  Spoken  English 
Texts,"  Uni vac  Report  No.  PX  10146.  Sperry  Uni vac  DSD,  St.  Paul,  Minnesota. 

LEA,  W.  A.  (1973b).  "Segmental  and  Suprasegmental  Influences  on  Fundamental 
Frequency  Contours."  In  Consonant  Types  and  Tone  (Proceedings  of  the  First 
Annual  Southern  California  Round  Table  in  Linguistics,  Ed.  by  L.  Hyman). 

Los  Angeles:  University  of  Southern  California  Press. 

LEA,  W.  A.  (1973c).  "An  Approach  to  Syntactic  Recognition  Without  Phonemics", 
IEEE  Trans,  on  Audio  and  Electroacoustics,  vol.  AU-21,  249-358. 

LEA,  W.  A.  (1973d) . "Evidence  that  Stressed  Syllables  Are  the  Most  Readily 
Decoded  Portions  of  Continuous  Speech,"  Paper  presented  at  the  86th  Meeting 
of  the  Acoustical  Society  of  America,  Los  Angeles,  California. 

LEA,  W.  A.  (2973e.  "Perceived  Stress  as  the  'Standard'  for  Judging  Acoustical 
Correlates  of  Stress,"  Paper  presented  at  the  86th  Meeting  of  the  Acoustical 
Society  of  America,  L03  Angeles,  California. 

LEA,  W.  A.  (1973^)*  "An  Algorithm  for  Locating  Stressed  Syllables  in  Continuous 
Speech,"  Paper  presented  at  the  86th  Meeting  of  the  Acoustical  Society  of 
America,  Los  Angeles,  California. 


69 


Report  No.  PX  11 963 


UNIVAC 


LEA,  V.  A.  (1974a).  "Prosodic  Aids  to  Speech  Recognition:  IV.  A general 
Strategy  for  Prosodically-Tuided  Speech  Under standing,"  Univac  Report  No. 

PX  10791,  Sperry  Uni vac,  DSD,  St.  Paul,  Minnesota. 

LEA,  V.  A.  (1974b).  "Sentences  for  Controlled  Testing  of  Acoustic  Phonetic 
Components  of  Speech  Unders tending  Systems,"  Univac  Report  No.  PX  10QS2. 

Sperry  Univac  DSD,  St.  Paul,  Minnesota. 

LEA,  V.  A.  (1975a)*  "Isochrony  and  Disjuncture  as  Aids  to  Syntactic  and 
Phonological  Analysis,"  presented  at  the  89th  Meeting,  ASA,  Austin,  Texas. 

LEA,  V.  A.  (1975b).  "Prosodic  Aids  to  Speech  Recognition:  VII.  Experiments 
on  Detecting  and  Locating  Phrase  Boundaries,"  Univac  Report  No.  PX  11S54. 
Sperry  Univac  DSD,  St.  Paul,  Minnesota. 

LEA,  V.  A.  (1976a).  "Acoustic  Correlates  of  Stress  and  Juncture,"  Univac 
Report  No.  PX  11693.  Sperry  Univac  DSD,  St.  Paul,  Minnesota. 

LEA,  V.  A.  (1976b).  "The  Importance  of  Prosodic  Analysis  in  Speech  Under- 
standing Systems,"  Univac  Report  No.  PX  11694.  Sperry  Univac  DSD,  St.  Paul, 
Minnesota. 

LEA,  V.  A.  (1976c).  "Prosodic  Aids  to  Speech  Recognition:  VIII.  Listeners' 
Perceptions  of  Selected  English  Stress  Patterns,  Univac  Report  No.  PX  11711. 
Sperry  DSD,  St.  Paul,  Minnesota. 

LEA,  V.  A.  ( 1 976d) . "Use  of  Intonaxional  Phrase  Boundaries  to  Select 
Syntactic  Hypotheses  in  a Speech  Understanding,"  presented  at  92nd  Meeting, 
Acoustical  Society  of  America,  San  Diego.  J.  Acoust.  Soc.  America,  vol.  60, 
Supplement  1,  Fall  1976,  312(A). 

LEA,  Vi.  A.  (1976^-  "Sentences  for  Testing  Prosodic  and  Syntactic  Components 
of  Speech  Understanding  Systems,"  Univac  Report  No.  PX  10953.  Sperry  Univac 
DSD,  St-  Paul,  Minnesota. 

LEA,  V.  A.,  MEDRESS,  M.  F. , and  SKINNER,  T.  E.  (1972a).  "Prosodic  Aids  to 
Speech  Recognition  I:  Basic  Algorithms  and  Stress  Studies,"  Univac  Report 

No.  PX  7940.  Univac  Park,  St.  Paul,  Minnesota. 

LEA,  V.  A.,  MEERESS,  K.  F.,  and  SKINNER,  T.  E.  (1972b).  "Use  of  Syntactic 
Segmentation  and  Stressed  Syllable  Location  in  Phonemic  Recognition." 

Presented  at  the  84th  Meeting,  Acoustical  Society  of  America,  Miami  Beach, 
Florida,  Nov.  27-30,  1972. 

LEA,  V.  A.,  MEDRESS,  M.  F. , and  SKINNER,  T.  E.  (1975a).  "Prosodic  Aids 
to  Speech  Recognition:  II.  Syntactic  Segmentation  an-]  Stressed  Syllable 
Location,"  Univac  Report  No.  PX  1 02^2.  Sparry  Univac  DSD,  St.  Paul,  Minnesota. 

LEA,  V.  A.,  MEDRESS,  M.  F.,  and  SKINNER,  T.  E.  (1973b).  "Prosodic  Aids 
to  Speech  Re cognition: III.  Relationships  between  Stress  and  Phonemic 
Recognition  Results,"  Univac  Report  No.  PX  10430.  Sperry  Univac  DSD,  St. 

Paul,  Minnesota. 


70 


UNIVAC 


Report  No.  PX  119&3 


i 


LEA,  W.  A.  MEDRESS,  M.  F.,  and  SKINNER,  T.  E.  (1975)*  "A  Prosodically- 
Guided  Speech  Understanding  Strategy,"  IEEE  IraQS.  ACPIlgtlCS,  SPSS.SL 
and  Signal  Processing,  vol.  ASSP  23,  30-38. 

LIEEERMAN,  P.  (1967)-  Intonation.  Perqgp.Uon,  Iflfl m££-  Cambridge: 

M.I.T.  Press. 

MEDRESS,  M.  F.  (1972).  "Revised  Computer  Phonetic  Transcriptions,"  ARPA/SUR 
unpublished  SUR  Note  No.  32,  Sperry  Univac  DSD,  St.  Paul,  Minnesota,  April, 

1972. 


OLDER  D K.  (1973).  "The  Effect  of  Position  in  Utterance  on  Speech 
Segment  Duration  in  English,"  J.  Acoust.  SdQ.  1235-1247* 


0"SHAUGHNESSX,  D.  (1976).  "Fundamental  Frequency  by  Rule  for  a Text-to- 
Speech  System,"  presented  at  the  92nd  Meeting,  Acoust.  See.  of  America, 
San  Diego.  J.  Acoust ,_Soc . of  America,  vol.  60,  Suppl.  No.  1,  S7b(A). 

SARGENd , D.  C.  (1975)*  "Computer  Algorithms  for  the  Extraction  and 
Application  of  Stress  Contours  from  Continuous  Speech  oentences,  Report 
No.  TR-EE  79-44.  School  of  Electrical  Engineering,  Purdue  University, 

West  Lafayette,  Indiana,  47907- 


SCHOLES,  R.  j.  (1971).  Acoustic  Cues  for  StrwrtUEes.  The 

Hague:  Mouton  and  Co. 


SKINNER,  T.  E.  (1973a).  "Speech  Parameter  Extraction: 
Frequency,  Spectral  and  Formant  Frequency  Processing," 
PX  10T76,  Univac  Park,  St.  Paul,  Minnesota. 


Fundamental 

nnlYaR  Report  No. 


7.  APPENDICES 


Heport  No.  ?X  11 963 


DNIVAC 


APPENDIX  A.  BOUNDARIES  AND  STRESS  PATTERNS  IN  THE  255  DATABASE  SENTENCES 

In  the  following  pages  of  Appendix,  the  255  sentences  used  in  our  recent 
studies  of  acoustic  prosodic  patterns  are  listed.  The  sentences  are  grouped 
into  subsets  which  test  specific  syntactic  or  lexical  effects  on  prosodic 
structures.  Each  sentence  is  preceded  by  an  identifier  consisting  of:  (a)  the 
letters  "PSS"  meaning  "prososyntactic  sentence",  in  contrast  to  phonetic 
sentences,  etc.;  (b)  either  S (short)  M (medium),  L (Long),  or  I (referring 
to  phonetic  or  extra  structural  tests),  along  with  a number  identifying  the 
sentence's  place  in  the  ordered  description  of  the  database;  (c)  a prediction 
of  the  stress  pattern  (stressed  or  not  for  each  syllable),  with  parentheses 
around  phrases;  and  (d)  a tree  number  (e.g.,  T4),  indicating  the  syntactic 
tree  that  represents  the  surface  structure  of  the  sentence.  These  identi- 
fiers are  described  more  fully  in  another  report  (Lea,  19?6e). 

Also  accompnaying  the  sentences  are  markings  of  the  perceived  and 
automatically  detected  prosodic  patterns.  Above  each  syllable  is  a number 
between  -5  and  +5)  specifying  the  stress  score  (SS)  for  that  syllab.  i , where 
-5  means  all  five  listeners  heard  the  syllable  as  reduced,  while  +5  indicates 
all  heard  it  as  stressed.  (See  Lea,  1976c).  Each  syllable  (or  portion  of 
speech  including  more  than  one  syllable)  which  was  automatically  located  as 
a stressed  syllable  is  underlined.  Thus,  only  syllables  with  stress  scores 
of  +3,  +4,  or  +5  should  end  out  being  underlined.  Any  underlined  portion 
that  does  not  include  a syllable  perceived  as  stressed  (that  is,  any  portion 
with  no  SS  >3)  is  a false  alarm  in  stress  location.  Any  syllable  winh  SS  > 

+3  that  is  not  under lined  is  a missed  stress. 

Another  form  of  information  displayed  on  the  sentences  concerns  phrase 
boundaries  detected  from  Fo  contours.  Every  detected  phrase  boundary  is 
marked  by  a vertical  bar  approximately  at  the  position  in  the  utterance  where 
it  was  detected.  Each  position  where  a boundary  wee  expected  but  not  detected 
(i.e.,  a missing  boundary)  is  shown  by  a star. 


Report  No.  PX  11963 


JNIVaC 


SUBSET  1A.  One  Stress  Per  Constituent 


PSS  S2-SS,T1 
PS£  S4-SUS , T2 
PSS  S8-SSS , T3 
PSS  S12-SUSS , T4 
PSS  S16-SSSS / T5 
PSS  S24-SSSS, T6 
PSS  S26-S (US ) , T1 
PSS  S28-S (SU) ,T1 
PSS  S37-SU (US) ,T2 
PSS  S39-SU (SU) ,T2 
PSS  S42-SS (US) ,T3 
PSS  S42-SS (SU) ,T3 
FSS  S47-S (US)S,T3 
PSS  S51-S (SU) S,T3 
PSS  S56-SS (UUSU) ,T3 
' 3S  S57-SS  (USU) , T3 
PSS  S58-SS (3UU) ,T3 


+5  +4 

Men*  know. 

+5  +1  +4 

'Men  will  kn !qw. 

+5  +2  +3 

Map,*  know*  Rga- 

+5  -1  +4  +2 

JUen  will  | know*  Ron. 

+5  +4  +5  +2 

Men*  know  I Ron*  now. 

+5  +3  +5  +5 

Men*  owe*  Ron  | rum. 

+5  -1  +4 

Men  en*  roll. 

+5  +5  -2 

Men  I worry. 

+5  -3  -3  +5 

Men  will  en*  roll. 

+5  -3  +4  -1 

will  I 

+5  +5  -1  +4 

Men*  know  I Marie. 

+5  +5  +4  -1 

Men*  know*  Mary. 

+5  -3  +4  +4 

Men*  en  roll*  Ron. 

+5  +5  -2  +5 

Men  | worry  ! Ron. 

+5  +3  . +2— 5+5— 3 

I Men*  know  I Leonora. 

+5  +3  -2  +5-3 

Men*  know  Ma  Itia. 

+5  +5  , +5  -5-  -I 

Men*  know  ■ Melanin. 


74 


Report  No.  PX  11963 


UN  IV  AC 


One  Stress  Per  Constituent,  (cont. ) 

+5  -1  +4  -1  +5 

[Men*  en  roll  Ma  Irie. 


SUBSET  1A. 

PSS  S66-S (US) (US) ,T3 
PSS  S67-S(US) (SU)/T3 
PSS  S68-S(SU) (US) ,T3 
PSS  S69-C(CU) (SU)/T3 
PSS  S107-S  S(UUS),T3 
PSS  S10S-SS (USU) ,T3 
PSS  S136-S  U(US) 


+5  -3  +3  +5  -2 

Men  en  roll*  Mary. 

+5  +4-2  -2  +5 

iMen  I worry  Ma irie. 

+5  +4  -2  +4-2 

Men  I worry  I Mary. 

+5  +4  -3  -2  +5 

Ron*  knew  a rr.a  irine. 


Bon  * knew  an  | airman. 

+5  -2  -2+4  +5  -2 

(SU),T4  Ron  will  en [roll  j airmen. 


Report  No.  PX  11963 


UNIVAC 


SUBSET  IB.  Two  or  Itore  Stresses  Per  Constituent: 
Expansions  of  Determiner. 

+5  -1  -1  +4  , +5  +5 


PSS 

S137-SU  (US)  (SS ) , T4 

Ron  will 

en* 

' roll 

nine  men. 

PSS 

S138-SU (US ) (SS ) * , T4 

+5  -1 

Ron  will 

en  | 

+4  -2  +5 

roll  your*  men. 

PS£ 

S139-SU  (US ) (SS ) ,T4 

+5  -1 

Ron  will 

-3 

en  1 

+4 

roll  | 

+5  +5 

all  men. 

PSS 

S140-SU (US ) (SS) ,T4 

+5  -1 

Ron  will 

-3 

en 

+4 

roll 

+5  +5 
[no  men. 

PSS 

S141-SU (US) (SUS) ,T4 

+5  -1 

Ron  will 

-2 

en 

+4 

roll 

. +4  +5 

|any  men. 

PSS 

S142-SU (US ) (SUS)  ,T4 

+5  -1 

Ron  will 

-3 

en 

+4 

* roll 

+5  -3  +5 

,|  many  men. 

PSS 

S143-SU (US ) (SUS) ,T4 

+5  0 
Ron  will 

-2 

en'' 

+4 

* roll 

+5  +4  -2 

|nine  1 airmen. 

PSS 

S144-SU (US) (SSU) * , T4 

+5  0 

Ron  will 

-2  +4  -1  +5  -2 

’en  |rpJ.J.  your  | airmen. 

PSS 

S145-SU (US ) (SSS)*,T4 

+3  0 
Ron  will 

-2 

en 

+2 

Itffiii* 

+1  -3  +4 

all  your  | men. 

PSS 

S146-SU (US ) (SSS) ,T4 

+5  -1 

Ron  will 

-3 

en 

+4 

Iroll* 

+5  +5  +4 

all  nine  men. 

PSS 

S147-SU  (US ) (SUS.  , T4 

+5  -t 

Ron  will 

-3 

en 

i+4  l 

iroll  1 

+5-3  +5  +4 

any  nine  men. 

PSS 

S148-SU (US ) (SSU) ,T4 

+5  -1 

Ron  will 

-3 

en 

+4 

Iroll  1 

+5  +5  +5  -3 

all  nine  airmen 

PSS 

S149-SU  (US ) (SUSSU)  ,T4 

+5  0 

Ron  will 

-3 

sn 

, +5  , 

Iroll  1 

+5-2  +5  +4  -3 

any  nine  airmen 

*The  initial  prediction  was  that  "your"  would  be  stressed.  It 
now  seems  more  l'.Jcely  that  "your"  will  be  unstressed  in  these 
sentences. 


76 


Report  No.  PX  11963 


UN  IV  AC 


SUBSET  1C.  Prenominal  Adjectives,  Particuples,  and  Adverbs 
(with  4 or  less  syllables  in  NP) 


+5 

-1 

+ +3  +4  +5 

PSS 

Ml-SS (US) (SS) ,T4 

R ion  will  en  Iroll  vouna  men. 

+5 

-1 

-3 

+2  +5  -3  , +4 

PSS 

M3-SS (US) (SUS) ,T4 

Ron  yill 

■SO. 

Iroll  ! moral  1 men. 

+5 

-1 

-5 

+3  , -3  +5 

PSS 

M4-SS (US) (SUS) ,T4 

Ron  will 

en11 

r roll  | willing  men. 

+5 

-1 

-5 

, +4  , +5  +5  -3 

PSS 

M5-SS (US) (SSU) ,T4 

Ron 

will 

en 

iroll  I young  airmen. 

+5 

-1 

-5 

+3  -5  , +5  +5 

PSS 

M6-SS (US) (USS) ,T4 

Ron  will 

en 

1 roll  a ! young  man. 

+5 

-1 

-5 

, +3  +5  +4  +4 

PSS 

M8-SS (US) (SSS) ,T4 

Ron 

will 

en 

liolli  nine  young  men. 

+5 

-1 

-5 

, +5  -3  +2  +5 

PSS 

M9-SS (US) (SSS)* , T4 

Ron 

will 

en 

Iroll  your ! young  men. 

+3 

-1 

-5 

+4  +5  +4  +5 

PSS 

M10-SS (US) (SSS) ,T4 

Ron 

will 

en 

iroll i mean  young  men. 

+5 

-1 

-5 

+4  -3  +4-5  . +5 

PSS 

Mll-SS (US) (USUS) ,T4 

Ron  will 

en.'roll!  immoral  | men. 

+5 

-1 

-5 

+4  -5  +5  -4  +5 

PSS 

M12-SS (US) (USUS) ,T4 

Ron 

will 

en 

Iroll  a | moral  man. 

+5 

+3 

-2 

+4  -5  +5  -i  +4 

PSS 

M13-SS (US) (USUS) ,T4 

Ron 

will 

en 

Iroll  a lyoung I ma  rine. 

+5 

-1 

-4 

, +4  +5  -3  -2  +5 

PSS 

M14-SS (US) (SUUS ) ,T4 

Ron 

will 

en 

Iroll  1 mannerly  ! men. 

+5 

-1 

-4 

+4-5  +5  +5  _3 

PSS 

M15-SS (US)  (USSU) , T4 

Ron 

will 

en 

Iroll  a lyoung  airman. 

+5 

-1 

-4 

+4  , +5-2  +4  +5 

PSS 

M16-SS (US) (SUSS ) , T4 

Ron 

will 

en 

Iroll  | any  young  men. 

+5 

-1 

-4 

+4  +5  -2  +4  +5 

PSS 

M17-SS  (US)  (SUSS)  ,T4 

Ron 

will 

en  Iroll  many  young  men. 

+5 

-1 

-4 

+4  +5  -2  +3  +4 

PSS 

M18-SS (US) (SSUS) ,T4 

Ron 

will 

en 

* roll  i only  young  men. 

*The  initial  prediction  was  that  "your"  would  be  stressed.  It 
now  seems  more  likely  that  "your"  will  be  unstressed  in  these 
sentences. 


77 


Report  No.  PX  11963 


UNIVaC 


SUBSET  1C.  Prenominal  Adjectives,  Participles,  and  Adverbs 
(with  4 or  less  syllables  in  NP)  (Cont. ) 


PSS 

M19-SS (US) (USSS) ,T4 

+5 

Ron 

-1  -4  +4 

will  en*  roll 

+5  +5  -3  +4 

nine  moral  men. 

PSS 

M20-SS (US) (USSS) ,T4 

+5 

Ron 

-1  -4  +5  -5 

will  enroll  a 

+4  +3  +5 

new  young  man. 

PSS 

M21-SS (US) (USSS) ,T4 

+5 

Ron 

-1  -4  +5  -2 

will  enlroll  a 

. +5  +3  +5 

mean  younq  man. 

PSS 

M22-SS  (US)  (SSSU) ,T4 

+S  -1  -4  +4 

R|on  will  enlroll 

. +5  +3  +5  -2 

nine  young  airmen. 

PSS 

M23-SS  (US ) (SSSU ) * , T4 

+5 

Ron 

-1  -4  +4  -2  +3  +5  -1 

will  enjroll  your'  vounq  airmen. 

PSS 

M24-SS (US ) (SSSS ) * , T4 

+5 

Ron 

-1  -5  +4  +5  -i  +3  +5 

will  en|roll|  all  your | younq  men. 

PSS 

M25-SS (US) (SSSS ) * , T4 

+5 

Ron 

-1  -4  +4  -2  +4  +4  +5 

will  en  Iroll  your | nine  vounq  men. 

PSS 

M26-SS  (US)  (SSSS-)  * , T4 

+5  -1  -5  +4  -2  +4  +3  +5 

R |on  will  enlroll  your  | new  vounq  men. 

PSS 

M27-SS (US) (SSSS) ,T4 

+5  -1  -5  +4  +4  +4  +2  +4 

Ron  will  en  Iroll*  new  mean  vounq  men. 

*The  initial  prediction  was  that  "your"  would  be  stressed.  It 

now  seems  more  likely  that  "your"  will  be  stressed  in  these 
sentences. 


78 


Report  No.  PX  11963 


UNIVAC 


SUBSET  ID.  Prenominal  Adjectives,  Participle,  and  Adverbs 
(with  more  than  4 syllables  in  NP) 


+5 

-1 

+4 

-4  • 

-4 

+4  -5  +2  +2 

PSS 

M28-SU (US)  (USUUS) , T4 

Rsa  yill)-.. 

-£Pl 

roll 

a 

moral  1 marine. 

+5 

-1 

-5 

+4 

-4 

-1  +5  -3  +5 

PSS 

M29-SU (US ) (UUSUS ) , T4 

Ron  will 

J£D 

lro34 

_ajj 

iml moral  man. 

+5 

-1 

-5 

. +1+ 

-3 

. +4  -3  , +5  -3 

PSS 

M30-SU (US) (USUSU) ,T4 

Ron  will 

en 

roll 

imlmorall  airmen. 

+5 

-1 

-3 

i +lt 

-2 

. +4  -3  +5  -2 

PSS 

M31-SU (US) (USUSU), T4 

Ron  will 

en 

Iroll 

_a 

Imoral  airman. 

+5 

-1 

-5 

+4 

-4 

+5  -1  +4  -2 

PSS 

M32-SU (US ) (USUSU), T4 

Ron 

will 

en 

Iroll 

a 

1 lonely  airman. 

+5 

-1 

-5 

i +1* 

-4 

+5  _4  +4  +3 

PSS 

M33-SU (US ) (USUSS ) , T4 

Ron 

will 

en 

Iroll 

a 

Imoral  vounq  man. 

+5 

-1 

-5 

+4 

-4 

, +5  +3  -3  +4 

PSS 

M34-SU (US ) (USSUS) ,T4 

Ron 

will 

en 

| roll 

a 

lyounq  moral  man. 

+5 

-1 

-5 

, +4 

+5  -1  +5  -1  +4 

PSS 

M35-SU (US) (SUSUS) ,T4 

Ron 

will 

en 

Iroll 

Imoral  | lonely  men. 

• 

+5 

-1 

-5 

i +1+ 

+5  _i  +4  -3  +4 

PSS 

K36-SU (US ) (SUSUS ),T4 

Ron  will 

en 

Iroll 

1 lonely  1 moral  men. 

PSS 

M37-SU (US) (SUSUS ),T4 

+5  -1  -5  +4 

R |on  will  en  Iroll 

+5  -3  +4  -3  +5 
1 many  moral  men. 

PSS 

M38-SU (US ) (SUSUS ),T4 

+5 

Ron 

-1  -5 

will  en* 

+4 

Iroll 

. +5  -1  , +5  -3  , +5 
1 only  | moral  men. 

PSS 

M39-SU  (US ) (USUUS  ),T4 

+5  "5  , 

Ron  will  en i 

+4 

roll  I 

+5  -4  +4  -2  +4 

many  ' willinq  men. 

PSS 

M40-SU (US) (SUSUS ),T4 

+5 

Ron 

-1  -5  . 

will  en 1 

+4 

roll  I 

+5  _4  +5  -4  +5 

nine  iml  moral  men. 
• 

nr*  r 

M41-SU (US ) (SUSUS ) , T4 

+5 

Ron 

-1  -5 

will  en 

roll  1 

+4-3  +4  -4  +4 

any  1 young  marine. 

PSS 

M42-SU (US ) (SSUSU ) , T4 

+5 

Ron 

-1  -5 

will  en 

Iroll* 

+5  , +5  -3  , +5  -2 

nine  moral  airmen. 

PSS 

M43-SU (US ) (SUSSS ) , T4 

+5 

Ron 

-1  -5  . +4 

1 will  enl  roll 

, +5  _i  +4  +3  +5 

lonely  mean  young  men 

PSS 

M44-SU (US) (SSUSS ) , T4 

+5 

Ron 

will  eiii 

+4  , 

roll  1 

+5  +4  -4  +3  v5 

new  moral  young  men. 

79 


Report  No.  PX  11963 


UN  IV  AC 


SUBSET  ID.  Prenominal  Adjectives,  Participle, 
(with  more  than  4 syllables  in  NP) 

and  Adverbs 
(cont.  ) 

PSS 

M45-SU  (US  ) (SSSUS  ) , T4 

+5 

Ron 

-1 

will 

in* 

+4 

roll 

+5  +5  +4  -4  . +5 

new  young  moral  men 

PSS 

M46-SU (US ) (SSS) ,T4 

+5 

Ron 

-1 

will 

-5  , 

en  i 

+4  . 

roll  1 

+5  +3  +4 

well  known  men. 

PSS 

M47-SU (US) (SUSS) ,T4 

+5 

Ron 

-1 

will 

-5  , 

en  | 

+4  , 
roll  I 

+5  _4  +4  +4 

really  young  men. 

PSS 

M48-SU (US ) (SUUSUS) , T4 

+5 

Ron 

-1 

will 

in  | 

+4  , 
roll 

+5  -2  -2  +4  -4  , +5 
really  immoral  men. 

PSS 

M49-SU (US ) (SSUUSS) ,T4 

+5 

Ron 

-1 

will 

-5  , 

en  I 

+4 

roll 

+5  -2  +5  -4-3 

really  1 mannerly 

+3  , +5 
young  ! men. 

PSS 

M50-SU (US ) (SUSSS ) , T4 

+5 

Ron 

-1 

will 

-5 

en* 

+4 

roll 

, +4  -3  , +3  +2 

i reallv  IweLl  known 

, 

+4 

men. 

+5 

-1 

>5 

+4 

+4  -2  -4+4  -4 

PSS  M51-SU(U3)  ^SUSUSUSS) ,T4  Ron  will  en I roll  1 really  immoral  1 

+4  +3  +4 

well  known  ' men. 

+5  -1  -5  , +4  , +4  -2 , +3  +2 

PSS  M52-SU (US ) (SUSSUSUS) , T4  Ron  will  en I roll  really  1 well  known 

-4  +4  -4  , +5 

im  imordi  i men. 

+5  -1  -5  +4  | +5  -3i  +5  -1 

PSS  M53-SU(US) (SUSUUSUS) ,T4  R Ion  will  en*  roll  really  willing 

-4  +4  -3  , +5 
immoral  I men. 

+5  -1  -5  i A i +5  -3  i +4  -2 

PSS  M54-SU (US ) (SUSUUSUSS)  ,T4  R_on  will  en  roll  really  willing 

-4  , +5  -4  +3  +5 

im] moral  young  men. 


80 


Report  No.  PX  11963  UN IV AC 


SUBSET  IE.  "Flying-Planes"  Paradigm 


PSS 

M229- (US) 

U 

(SUS)  ,T3 

+5  +1 
Lawmen 

-1 

are  | 

+5  -1  +5 
lying  men. 

PSS 

M230-  (SU) 

U 

(SU)S,T4 

+5  +1 
Lawmen 

-2 

are  | 

+5  -1  , +5 

ruling  | Maine. 

PSS 

M231- (US) 

U 

(SUS)  ,T3 

+5  -1 
Airmen 

ar  le 

+5  -2  +3 

lying  men. 

PSS 

M232- (SU) 

U 

(SU)  (SU) , T4 

+5  -3 
Airmen 

are  | 

_4  -2  , +5  -3 
eyeing  | women. 

PSS 

M233-  (SU) 

U 

(SUS) , T3 

+5  -2 
Airmen 

-1 

ar  | e 

+5  -2  +4 

erring  men. 

PSS 

M234-  (SU) 

U 

(SU)  S,  T4 

+5-4  -1 

Women  are  | 

+5  -2  +4 

airing  | wool. 

PSS 

M235-  (SU) 

U 

(SUS) ,T3 

+5  -1 
Lawmen 

-2 

are  | 

+4  -1  +5 

rummy  men. 

+5  -1 

-2 

+4  -1  +5  -3 

PSS  M236- (su)  U (SU)  (SU),T4  Lawmen  are  [ ruling  | women. 


8' 


Report  No.  PX  11963 


UN  IV  AC 


SUBSET  2C0 . Movement  of  stress  in  the  first  constituent. 


-4 

+5  +5  , +5 

PSS 

SI 03- (US ) S S,T3 

A ; 

man*  knew*  Ron. 

-2 

+5  -4  +3  +5 

PSS 

S104- (USU)  SS,T3 

A 

woman  1 knew*  Ron. 

-4 

+5  -4  +5  +5 

PSS 

S105- (USU)  SS , T3 

An 

airman*  knew*  Ron. 

— * 

$0+4  +5  +5 

PSS 

S106- (UUS ) SS , T3 

1a 

. marine  | knew*  Ron. 

A 

+4  +2  -4  +5 

PSS 

S109-  (US ) S ( US  ) , T3 

m] 

onroe*  knew  a*  man. 

-4 

+5  +4  -2  +5 

PSS 

SllO-(US)  S ( US ) , T3 

A 

man*  linew  Ma  jrie. 

+5-1  +5  -3  , +4 

PSS 

Slll-(SU)  S ( US ) , T2 

M 

[ary*  knew  a | man. 

-4  +5-2  +4  -4  . +4  -3 

Maria  ! knew  an  1 airman 

PSS 

S112- (USU) S (USU) ,T3 

SI  13 -(USU)  S (USU)  , T3 

-3 

+5  -3  +4  0 +5-3 

PSS 

A 

woman  1 Knew*Ramona. 

-2  +5  -3  +5  -3  ,+4  -2 

PSS  SI]  4-  (USU)  S (USU),T3  An  airman  I knew  Ralmona. 


-3  +5  -3  -1  -5  , +4  | +1  -4  +5 

PSS  S115- (USU)  U (US)  (UUS ) , T3  A|n  airman  will  enlrolll  Marianne. 


+2  -r.4  +5-2  -1  -4  +4  -4  -3  , +5 

PSS  S116-  (UUSU)  U (US)  (UUS),T3  he  longra  I will  enlroll  a majrine. 

-4  -1  +5-4-3,  -1  -5  , +4  -4  -3  +4 

PSS  S117- (UUSUU) U (US ) (UUS) , T3  An  Armenian  I will  en Iroll  a m larine 

-4  -4  +5  -1  -4+4  +1-4+5  -2 

PSS  S118- (UUS)U(US) (UUSU) ,T3  A marine  will  enlroll  1 Leonora. 

-3  -5  +5  _i  _5  +4  -5  -o  ,+5  -4-2 

PSS  S119- (UUS) U (US) (UUSUU) ,T3  1a  marine  will  enlroll  an  at  imenian 


82 


Report  No.  PX  11963 


UN  IV  AC 


SUBSET  3D.  Verb/Noun  in  Pairs 

-2  +4  +5  0 . +4  0 -5  -1 

PSS  X15- (SSSU) (USU) (USUU) ,T3  Our  new  | object  increases  in- 

+4  -3-5  -1 

accuracy. 

+5-1  +5  -2  +4  -2  +5  -i  , -5 

PSS  X16-(SUS)  (US)  (USUU)  (USUU),T243  Very  few  ob  ! j ectl  to  | increas  les 

-2  +5  -3  -5-i 
jin  a [ccuracy. 


PSS  X17- (UUSU) (US) (US) (UUSU) (USU)  U (SU) (USU) , T244 

-3  +5  -3  -2  . +5  -1  +4 

The  com  [puter  can [not  permit 

0-5  +3-2  -4  +5  0 -3 

vio  | lations  of  | syntax  lor 

f4  -1  -4  +5  _2 

Conflicts  in  | schedule. 


-5  +5  -3  ,-4  -5 

PSS  X18- (USU) (UUSUSU)U  S U (USU)  U (USU) , T245  The  record  | of  a 

+5  +1  . -3  , +3  -1  , +5  -2  -5 

firearm  I per  Imit  will  I show  if  the 

+5-1  _4  -4+4  0 

suspect  ! is  a I convict. 

-5  +3  +5  -1  -1  +3  -5. 

PSS  X19- (USSU) (US) (USU) (USU) , T246  The  two  records  , perlmit  al 

+5  0 -4  +5  -4 

conflict  I in  | schedule. 

-2.  +5  -1  , -3  +5  -2 

PSS  X20- (USU) (US) (UUSU) (USUUS) ,T247  His | records  | conflict  with 

-3  . +50  -4  _5- 5 -2  , +4 

his  i permit  in  | several  ways . 

+5  -4  +5  0 -3  +5 

PSS  X21- (susu:  U(SSU)  (UUSUUS) ,T248  Former  convicts  | are  | pr  lime 

+5  0 -4  -5  +5  -1  -2  +3 

suspects  for  the  I hi lacking  case. 


83 


Report  No.  PX  11963 


UNIVAC 


SUBSET  3D.  Verb/Noun  in  Pairs  (cont.) 

-3  +4  , +3  r5  0 

PSS  X22- (USSUUS) (US ) U (USUU) U (US ) (US) , T249  The  news  I mag  azine 


+5 


-2  , +5 


-2 


Crime  sus  , pects  that 


ll 


+5  -1  -3  -3  . -2  , +4 

packers  will  con  vict 


0 +5 

them ! selves. 


PSS  X23-S (US) (USU) (UUSU) / T250 


+5  0 +5  -2  , +5  0 | -5 

Let 1 s*  record  our  progress  to 


-4  . +4  -3 
the  I tower. 


PSS  X24-S (US) (SSU3UU) , T251 


+5  -3  , +5  0 , +5  -2 

Let 1 s pro! qress  towards  ! record 


+5.-4  0 
al I titude. 


+3  -4 


+4 


-2 

I -2 

+5 

-1 

Did 

lyou 

increase 

-3  | 

+3 

-5 

to  1 

per  | : 

mit 

the 

+5  -1  -5  -7  +1  -2  | 

program  to  pro  qress  mor I e 

+5  -4  ,-1 

rapid l ly? 


PSS  X26-U  U (SU) (USU) (USUSU) (USU) (UUSU) (UUSU) (SU) , T253 


-2  , +3  -3  +4  | -5  +4  o 

Did  I you  record  I the  incre 


-5  |+4  -5  +5  | 0 

ill  1 rate  or  pro  I qress 


-4  +5  -2  -3  -5  | +4  -1 

re  1 sul I ting  from  the  I permit 


-4  . -5  +4  -3  , +5  0 

the  I department  I issued? 


84 


Report  No.  PX  11963 


UN  IV  AC 


SUBSET  3D.  Verb/Noun  in  Pairs 


PSS  X28- (SUSS) (USU)U(SU)S  U S U U (USU) (SU) (USUUU) , T255 

-1  -5 . +5  . +3  -5  . +5  -4  -2 

With  a | strong  I hand,  the  I lawmen  were 

+3-2  +4  -2  +4  -1 

ruling  I Main  le,  |but  still  1 they 


Report  No.  PX11963 


CJNIVaC 


I 

I 


SUBSET  3F.  Phonetic  Influences  on  Simple  Sentence  Structures. 


PSS 

X253 

- S0NQR,DECL1 

+5  -1  +3  +5 

Ron  may*  know*  May. 

PSS 

X254 

- FRIC,DECL1 

+5  -2  :-3  , +5 

Sue  has  seen  Fay. 

PSS 

X255 

- STOP, DECL1 

+5  , +3  , +5 

Plete  cam  take  Kay 

PSS 

X256 

- STOP, C0MM1 

+5  , +5 
Take  ! Kay. 

PSS 

X257 

- STOP,  C0MM2 

+4  +3  x5 

Take*  Kay*  pop. 

PSS 

X258 

- STOP, COMM3 

+5  +5  +4 

Take  * Kay  | to  o. 

PSS 

X259 

- FRIC,C0MM1 

+5  +4  | +5 

Serve*  Sue  1 fish. 

PSS  X260  - FRIC,  CQMM2 
PSS  X261  - STOP, Y-Nl 
PSS  X262  - STOP, Y-N2 
PSS  X263  - FRIC, Y-N2 
PSS  X264  - FRIC, Y-N2 
PSS  X265  - STOP, WH1 
PSS  X264  - STOP, WJ2 
PSS  X2C-7  - FRIC, WH1 


+5 

. +3, 

+5 

Show 

* Sugl 

how. 

■ +5 

i +2 

+5 

Can 

J^-te. 

1 take 

K ^y? 

-1 

+5 

, +4 

,+5 

Can  1 

Pete 

1 type  t i oo? 

-2 

+5 

+3  + 

s 

Has 

Sue*  seen  F 

Q 

+5 

+2  i 

+4 

H as 

Sue* 

seen  1 

hO  W? 

i r-  ■*>  i 1.  . 

-o  | th  Tp 

Who  can  I take  I Kay? 

+5  -2  +4  +5 

Where  can  I Pete*  Pack? 

+4  -1  +4  +5 

Who  1 has*  seen  Fay? 


86 


Report  No.  PX  11963 


UN  IV  AC 


SUBSET  4 C.  NP-PP-PF  Subordination 


L262-SU (US) (SUSS) (US) ,T227 

+3 

-1 

-5  +3 

+5  -3 

+3  +4 

PSS 

Ron 

will 

en*  roll 

Imanv  young  men 

-5 

+5 

in*May. 

+5 

-1 

-5  +3  , 

+5  -3 

+3 

PSS 

L263-SU (US) (SUSS) (US),T228 

Ren 

will 

en*roll  I 

many 

ycu  ng 

+4 

-3 

+5 

men 

from* Maine. 

L264-SU (US) (SUSS) (US) (US,T229 

+5 

-1 

-5  ,+3  , 

+5  -4 

+3 

PSS 

Ron 

will 

en Iroll | 

many 

young 

+4 

-3 

+5  . 

-2  +4 

men 

from 

1 Maine  I 

in  May. 

+5 

-1 

-5  +3  , 

-4  -4 

+3 

PSS 

L265-SU (US) (SUSS) (UUSUS) , T227 

Ron 

will 

en«roll  I 

many  young 

+5  -3  -4  +5  -4  +5 

men  j.n  the  | month  of  1 May. 


+5  -1  -5  +3  +5  -4  +3 

PSS  L266-SU(US) (SUSS) (UUSU) (US) ,T230  Ron  will  en Iroll  j many  young 


-1-4  -3 

+5  -1  -4 . +5 

men 

1 into  the 

1 army  in i May 

+1 

-5  +5 

-2  -5  +5  -3 

PSS  L267-SU(US)  (UUSU)UU  (UlTS ) , T231  Put 

the  block 

on  the  1 table 

0 -5  +2  -5  +5 

which  is  the  I door . 


+1  -5  +5 , -1  -5  i-2  -5  , 

PSS  L268- (US) S3 (UUSU) (UUUUS),T232  the  bio  ick  which  is  [on  the! 

+5  -2  +4,-4  -1  -5  | +5 

table  I ov I er  by  the  I door . 


87 


Report  No.  PX  11963 


UN  IV  AC 


SUBSET  6 A.  Commands. 


PSS 

S259 

- S S (US) ,T21 

+5  +4  -2,  +5 

Run  mine  a lone. 

PSS 

S260 

- S S (SU) , T21 

+5  +4  +5  -i 

Run  mine  early* 

PSS 

S261 

- S S (SUU) , T21 

+5  +4  .+5-4  +1 

Run  mine  anyway. 

PSS 

S262 

- S S (SUU) , T21 

+5  +5  +4-4  +i 

Warn  1 Ron  anyway. 

PSS 

S263 

- S(US)  S,T21 

+5  -3+5  , +4 

Warn  Ma  rie  now. 

PSS 

S264 

- S (SU)  S,T21 

+5  +4  -2  +5 

Warn  1 Murray*  now. 

PSS 

S275 

- S (SUU)  S , T21 

+5  +5-5-2  +3 

Warn  1 Marion*  now. 

PSS 

M55  - 

S (USU)  S , T21 

+5  -3+4-3  +5 

Warn  Ma  Iria  Inow. 

PSS 

M56  - 

S (UUSU)  S,  T21 

+5  +4—4  +4-3  +5 

Warn  Leo  [nor a 1 now. 

PSS 

M65  - 

S U (SU) # T22 

+5  -1  +4  -3 

Loan  me*  money. 

PSS 

M66  - 

S (US)  U,  T22 

+5  -4  +4  o 

Loan  Ma  !rie  one. 

PSS 

M67  - 

S (SU)  U,  T22 

+5  „ +5  -3  +1 

L [oan*  Mary  one. 

PSS 

M68  - 

S (US)(SU),  T22 

+5  -2  +3  +5 

Loan  Ma  rie  1 aoney. 

PSS 

M69  - 

S (SU)  (SU)  , T22 

+5  +5  -2  +4  -1 

Loan*  Mary  | money. 

PSS 

M70  - 

S S (USUU) , T22 

+5  +4  -4  +5  -3  0 

Loan*  Ron  a 'luminum. 

PSS' 

M71  - 

S S (UUSUUo),  T22 

+5  +4  -2  -5  , +5  -4-1 

Loan  | Ron  an  a luminum 

PSS 

M72  - 

S (USU)  S,  T22 

+5  -3  +4  -3  , +4 

Loon  an  1 airman  -urn. 

+4 

wire. 


88 


Report  No.  PX  11963 
SUBSET  6 A.  Commands  (Cont. ) 
PSS  M73  - S (UUSUU)  S,  T22 
FSS  M74  - S (UUSUUSU)  S,  T22 


UN  IV  AC 


+5 

-5 

-2  +4-4-2 

+5 

Loan 

an 

Ar  Imenlan  | 

rim. 

+5 

-5 

-1  +5-4-2 

+4  -3 

Loan 

an 

Ar  menian 

airman 

' 

i 


1 


89 


Report  No.  PX  11963 


UNIVAC 


SUBSET  7B. 

Yes/ No  Questions 

PSS 

Ml  43  - 

U 

(SS)  S,  T28 

-1 

Will* 

+5  -4  +H 

your  men  kn  ow? 

PSS 

Ml  44  - 

U 

(SSS)  S,  T28 

-2 

Will* 

+5  -3  -2  +4 

all  your  men*  know? 

PSS 

Ml  45  - 

U 

(SSSSU)  S, T28 

-3 

will  .J 

+5  -3  +4  +5  -3  +4 

all  vour  nine  airmen  know? 

PSS 

M153  - 

U 

S (US)  (r3)  /T30 

-3 

will  , 

+5  -4  +5  +2  +5 

Ron*  |en  roll  young  jmen? 

PSS 

M154  - 

U 

S (US)  (SUSS),T30 

-3 

will 

+5  -3  +4  +5  -4  +1  +4 

Ron*  *roll  many  vouna  me  In 

PSS 

Ml  5 5 - 

U 

S (US) (SUSSU) , T30 

-2 

Will 

+5  -3  +b  +5  -2  +3 

'Ron*  en  roll  Ireallv  vouna 

+5  -2 
air  | men? 


PSS  M156  -US  (US) (SUSUUSUSS) / T30 

-2  +5  -4  +4  +5  -3  . +4  -2  -3  ,+5  -4 

wjjli  |R2d  en*  roil  Legally  I uilUas  iujlmotal 


J--3  4-7 

• j . ■ 

vouna  1 men? 

-4  -3 

i +5 

+4  -2  . +4 

PSS 

M159  - U 

U S S (SS)  /T32 

Will  I 

lowe* 

Ron  my  Irlnq? 

-5  -2. 

+5 

+4  +5  -1  +4  ,-3 

PSS 

Ml  60  - U 

U S S (SSSU) ,T32 

Will  i! 

owe* 

Ron  1 all  my  mon  iey? 

-4  -2 

, +5 

+5-3  +4  -3  +5: 

PSS 

Ml 63  - U 

U S (SUSU)  S,  T32 

wm  i 

1 owe 

1 many  1 airmen  ru  lm? 

Report  No.  PX  11963 


UN  IV  AC 


SUBSET  7D.  W.  H.  QUESTIONS 


+5 

-1  -4  +5  +4 

PSS 

LI  - S U 

' (US)  S,  T39 

When 

will  Mairie*  know? 

+5 

-1  +4  -3  +5 

PSS 

L2  - S U 

' (SU)  S,  T39 

When 

will*  Mary*  know? 

+5 

+1  -4  +4  +4  +5 

PSS 

L3  - S U 

f (US) (US),  T39 

When 

will  Mairie*  en  roll? 

+5 

-1  -4  .+4  +5  -3 

PSS 

■L4  - S U 

f (US)(SU),  T39 

When 

will  Mairie*  marry? 

+4 

-1  +3  -4  +5  +4 

PSS 

LI  4 - S 

US  (US)  S,  T40 

When 

will  |Ron*  en  roll*  men? 

+4 

-1  +5  -4  +4  +3 

+5 

PSS 

LI  5 - S 

US  (US) (SS) ,T40 

When 

will  | Ron*  en  roll  vounq Imen? 

+4 

-1  , +5  -5,  +3  ,+3  -4 

PSS 

L16  - S 

US  (US)  (SUSSU)  ,T40 

When 

willl  Ron  enlroll  Ireally 

, +3 

+5  -3 

Ivouna  airmen? 

+5 

+3  +4  -4  +3 

PSS 

L27  - S 

S S (US ) , T98 

Who* 

loaned  1 rum  tol  Ron? 

+5 

-2  , +5  +4  -3  +4 

PSS 

L28  - S 

U S S (US),  T101 

Who  will!  warn!  Ron  iln  Mav? 

+5 

-2  , +4  +3  +5  0 -4 

+5 

PSS 

L36  - S 

U S S (US),  T110 

Who  will  1 Ron  loan  rum  to  in  I 

May? 

+5 

-2  +4  +4  -4  +5 

PSS 

L37  - S 

U S S (US),  T112 

What 

willl  men  1 loan  tol  Ron? 

+5 

-2  +5  +5-4  +4 

i +5 

PSS 

L39  - S 

U S S (US)  S,  Til 4 

What 

| will | men  | loan  to  iRon 

Inow? 

+5 

-2  +5+3  +3  -3 

, +5 

PSS 

L40  - S 

U S S S (US),  T115 

What 

1 will  Imen  1 loan*  Ron  in 

iMay? 

+3 

-2+5  . +4  -4  +5  -3 

i +5 

PSS 

L41  - S 

U S S (US) (US) , T116 

What 

willl  men  1 loan  to IRon  in 

Mav? 

+5 

-2  +5  +4  +4  -4 

+5 

PSS 

L42  - S 

U S S S (US) , T117 

When 

will  men*  loan*  rum  to  Ron? 

91 


Report  No.  PX  11963 


UN I VAC 


SUBSET  8A. 
PSS  L159  - 

PSS  LI 60  - 

PSS  L161  - 

PSS  L162  - 

PSS  L163  - 

PSS  L164  - 

PSS  LI 65  - 

PSS  L166  - 

PSS  L167  - 


Coordinate  Sentences. 
S S S U S S S,  T182 


S S S U S S S,  rrl82 


S S S U S S S,  T182 


S S S U S S S,  T182 


S S S U S S S,  T182 


S S S U S S S,  T182 


S S S U S S S,  T182 


S S S U S S S,  T182 


SUSSU5USS,  T183 


+3  +2  +5  -3  +5  +2 

Ron*  knew*  Lvnn  and  Lou*  knew* 

+5 

Wayne* 

+5  +i  +5  -1  +5  +1 

Ron*  knew*  Ly Inn  or  | Lou*  knew 

+5 

Wayne. 

+4  +2  +5  -3  +4  +2 

Ron  ! knew*  Ly  |nn  and  | Lvnn*  knew  | 

+5 

Wayne. 

+4  0 +5  -1  +4  0 

Ron*  knew*  Lynn  I or  Lvnn*  knew  I 

+5 

Wayne. 

+4  +2  +5  -3  . +5 

Ron*  knew  | Lynn  | and  | Lynn* 

+2  +5 

knew  | Ron. 

+4  +1  +5  -2  +5  +1 

Ron*  knew*  Lylnn  or  Lynn*  knew  | 

+5 

Ron. 


+5  +1 

Ron*  knew 


+3  "3  , +5  -1 

Lynn  and  I Lou*  knew 


+3 

Lynn. 

+5  +2  +3  , t2  +5  0 

Ron*  knew*  Lyn  t n lor  Lou*  knew* 

+2 

Lynn. 

+4  -3  -1  +5  -3  +*+ 

Ron  may*  know  ! Lvnn  i and  Lou 


ma^r*  know jWa^ne. 


92 


Report  No.  EX  11963 


UN  IV  AC 


SUBSET  8A. 
PSS  L16S  - 

PSS  L169  - 

PSS  LI 70  - 

PSS  LI 71  - 

PSS  LI 72  - 

PSS  LI 73  - 

PSS  LI 74  - 

PSS  LI 75  - 

PSS  LI 76  - 


Coordinate  Sentences. 


+4  -3  -1  +5  , -3  +5 

SUSSUSUSS,  T183  Ron  may*  know  | Lvrrn  and  Lynn 


-3  -1  . +5 

may*  know  I Wayne . 

+5-1+  -i  +5  -3  +5 

SUSSUSUSS,  T183  Ron  may*  know  | Lynn  ! and  Lynn 

-1+  -1+5 

may*  know  I Ron. 


SUSSUSUSS,  T183 


+5  -3  | +3  | +3  -2  , +4  -4 

Ron  may  I know  I Lynn  Q£  I Lvnn  may* 


-1  l +5 
know  I Ron. 


+4  -2  ,+5-4  . +2,-2  +4  -2 

S U S S U S U (SU)  S,  T183  Ron  may  I ruin  I Lynn  I or  Ron  may 


+5  -2  +2 
marry*  Lynn. 

+4  -1  +4-5  +2  -4  +4  -1 

S U S S U S U (SU)  S,  T183  Ron  may | ruin  |Lynn  and*  Ron  may 

+4  -3  +2 

marry*  Lynn. 

+5  -1  +4  +1  -3  +3  -1 

SUSSUSUSSS,  T1B3  Ron  may*  know*  Lynn  |or  Ron  may 


. +5  -1  +1 

| not*  know*  Lynn. 

+5  +5  -5  +2  -4  +3 

SUSSUSUU  (SU)  S,  T183  Ron  may  fruin  | Lynn,  but  |Ron 


-2  +5.  +4  -3  +2 

may  | not  | marry  Lj  ynn. 


S U U(SU)S  U S U (SU)  S,  T183 


nn 


t 


SUSSSUSUSSS,  T184 


+4  -1  +5-3  +2 

Ron  may  rum  Lynn. 

+5  -3  |+3  , +5  ,+5  , -3 

Ron  may  I loan  I Lynn  I rum  I and 


+5  -2  +4  +5  +5 

Lou  may  |loan  I Wayne  1 oil. 


93 


Report  No.  PX  11963 


UN  IV  AC 


SULSET  8A. 
PSS  LI  77  - 


PSS  LI 78  - 


Coordinate  Sentences, 
s U S S S (US)  U S U S S S (US),  T185 


S U S S S (US)  U S U S S S (US),  T185 


+5  -3  +1  , +4 

Ron  may*  loan  Lynn 

+5  -2  +4  . -2  +4 

rum  In ! May  and  Lou 

-2  -1  . +3  | +2 

may*  loan  ! Wayne  oil 


-3  | +2 

in  ! May. 

+5  -3 

Ron  may 

+4  -3  , 

rum  in 


+3 

loan 


+5 
Lynn  | 


+4 

May 


-3  +5 

and  Lou 


-4.0  +3  +1 

may  I loan  1 Wayne*  rum 


-4  +3 

in*  May. 


Report  No.  PX  11963 


UNIVAC 


SUBSET  8H. 
PSS  L206  - 

PSS  L207  - 

PSS  L208  - 

PSS  L209  - 

PSS  L210  - 

PSS  L211  - 

PSS  L212  - 

PSS  L213  - 

PSS  L211  - 


Coordinate  Verb  Phrases 
(SU)  S S U S S,  T129 


+4  .5  +4  +3  -5  +3 

Women  | own  y , arn  and  Iwear 

+4 

wool. 


S U (SU)  S U (SU)  S,  T193 


S U (SU)  S U (SU)  S,  T193 


+5  -2  +5-4 

R [on  may  | ruin 

l +1 
1 Lynn 

+5  -2 

0 

marry*  Lynn. 

+4  -2 

+5  -4 

i +"*  l 

Ron  may  I 

ruin 

iLynn  1 

+5  -a 

+5 

marry  1 Lou. 

+3  -2 

+5  -4  , 

+2 

Ron  may 

ruin  1 

Lynn, 

-5 


•v4.  -4  -4  +5  , +5 

Lo  lu,  and  1 annoy  I Wavne. 


U (SU)  S S U S S,  T195 


■1)  -2 


-1  ,+5  -4  . +4 

Will  | women  1 own  y larn  and 

+4  , +5 

wear  | wool? 


U (SU)  S S S S U S S,  T196 


-2  +5  -4+5+3  . +4 

Will  women  i own  I yarn,  iwear! 

+3  -2  +5  +4 

wool,  and  I woo  ' men? 


-2  +4  , +5-4  , +4  ,+4  -2 

U S (SU)  S (SU)  S U (US)  S,  T196  Will  | Ran  I ruin  I Lynn.  I 

+4  -2  -3  +4  +4 

Lou,  and  an  I noy  I Wayne? 


S S .?  U S S,  T197 


S S S U S S,  T197 


+3  +4  +3  -5  , +4 

Who  | owns*  yarn  and  | wears* 

+4 

wool? 

+2  +3  +4  -3  +4 

Who*  owns*  yarn  |_o£.  wears* 


95 


+4 

wool? 


Report  No.  PX  11 9b 3 


UNIVAC 


SUBSET  8H. 
PSS  L215  - 

PSS  L216  - 

PSS  L217  - 


Coordinate  Verb  Phrases 


S U (SU)  S U S (SU)  S,  T193 


S U (SU)  S U S (SU)  S,  T193 


S U (SU)  S U S (SU)  S,  T1S  3 


+5  -i  +5-5  +3  -5 

Ron  may  ruin  Lynn  and 


+2  +3  -4  +1 

not  Imarrv*  Lynn. 


+4  -i 
Ron  may 

+5  -4  +1 

marry*  Lynn. 


, +2  -3  +3 
I Lynn  .or  not 


n +5  -1  , 
Ron  may | 


+5  -5  +2 

£iua  Lli 


ynn 


but 


+2 

not 


+5  -3  ^ +1 

marry*  Lynn. 


96 


Report  No.  PX  11963 


UN  IV  AC 


SUBSET  8K.  Coordinate  Noun  Phrases 
PSS  L242  - (SUS)  S (SUS)  (USUU),  T215 

PSS  K243  - (SSUS)  S (SSUS) (USUU) , T216 

PSS  L244  - (SUS)  U S (SUS)(USUS),  T217 

PSS  L245  -US  (USS)  U (USS),  T202 
PSS  L246  -US  (USS)  U (USS),  T202 


+5  -4  +5  -1  . +5  -4 

Lou  and  1 Neal  knew  I Ron  and  I 

+5  -4  +5  -5  -2 

Lynn  Irespectively. 

+4  . +4  -4  | +5  | t-1 

Wayne,  I Lou,  and  I Neal  I k 'new 

+4  +4  -4  +5 

Lee,  I Ron,  jand  | Lynn,  | 

-4+4  -5-2 

respectively. 

+5  -4  +5  _4  o +5 

Lou  and  | Neal  | will  loan*  oil 

-4  +5  -2  +5  -3  +5 

an  |d  ore  to  I Lynn  and*  Ron. 

-2  +2  -5  0 +5  -4  -5  -1  +4 

I saw  an  | old  house  and  an  1 old  barn. 

-2  +1  -5  +5  +1  -4  -5  +5  0 

I saw  an  j old  | house  and  a 1 new  house- 


-3  +5  -4  +5  -5  -3  +4 

PSS  L247  - (USUSUUS)  U S (SUS)  U (SUSUS),  T202  The  two  alvailajble  meals 

+3  +5  +5  -4  +4 

were  | 1 1 oast , * ham  and  eggs 

-2  +5  +1  +5  -2  -4  +5 

and  pancakes,!  syrup  and  1 juice. 

-2  +2  +4  -5  -3  , +5  , -1 

PSS  L248  - (SSSUUS)  U (SUS)  U (SUUS),  T202  My  two  favorite  , meals  1 are  1 

+4  -4  +5  -1  -4  -2  -5  , 

haj a and  eggs  I and,  bizza  imsi  I 

+5 

beer. 

-3  +4  +4  -5  -3  +5  -1 

PSS  L249  - (SSSUUS)  U S (SUS)  U (SU),  T206  My  three  favorite  ! meals  are 

+5  +4  -5  +5  -2 

hash, ! ham  and  egos  and 

+5  -2 

pizza. 


97 


MMwniPiHimi 


Report  No.  PX  11963 


UN  IV  AC 


SUBSET  8K.  Coordinate  Noun  Phrases 

PSS  L250  - (SSSUUS)  U (SUS)(SUS)  U (SUUS ) , T206 

-3  +4  +4-5  -3  , +5  , -2 

My  three  favorite  meals  1 are 

, +4  -4  . +3  +4  -4  +3 

1 hash  and  I beans,  ham  and  eggs 

-4  +5  -2  -3  . +4 

and!  pizza  and  I beer. 


98 


Report  No. 
SUBSET  11A. 
PSS  L107  - 
PSS  LI 08  - 
PSS  LI 09  - 

PSS  L110  - 
PSS  LI 11  - 
PSS  L112  - 

PSS  LI 13  • 

PSS  LI 14 

PSS  LI 15 

PSS  LI 16 
' jS  LI 17 
PSS  L11B 


PX  11963 

Relative  Clauses 
S U S S S S,  T142 
S U S S S S,  T143 


UN  IV  AC 


S S S S S,  T143 
S U S S S S,  T144 
S U S S S S,  T145 


-SUSS  (US)  S S,  T146 


- S S S (US)  S Sj  T146 


-SUUUSUSS,  T147 


- S U S U S S,  T147 
_ S S S S S,  T152 
_ S S S U S Si  T148 


T142 

+5  i 

Men  1 

-1  +4  +4 

who  1 knew*  Ron 

, +3  +5 

ran*  Maine. 

T143 

+5 

Men 

, -2  +5 

! whom  Ron*  knew 

i +2  , +5 

, ran  1 Maine. 

S S S,  T143 

-3  , 
The! 

+5-4  . -1  , 

a [irmen  1 whom 

-3+3  0 

Marie*  knew  | 

+3  +5 

ran*  Maine. 

+4  +4  +4  i +1  +5, 

Men  1 Ron*  knew  I ran  Maine. 

+e  -2  +2  +5  I +2  +5 

Lyn  In/  I who  I knew*  Ron,  Iran*  Maine. 

+5  -3  +5  ° , +3 

Lynn,  I w i horn [ Ron*  knew,  ] ran  I 

+5 

Maine. 

Oil  which | Ron*  Loaned  to  |Ann  | 

+4  -4  +5 

ruined*  Maine. 

+1+  +4  +3  -2  +3  +3  -3 

Oil  1 Ron*  loaned  to*  Anfil  ruiasg 
+4 

Maine. 

+5  -1  0 -1+5  -1  . +2  , 

| Men  | who  are  in  | Rome  | may.  , DIP 1 

+5 

Maine. 

+4  -3  +4  , -2  +2  +5 

Men  in*  Romel  tnayj  run*  Maine. 

+5  +1  +2  +3  +5 

Men,*  Ron*  knew  Ran*  Maine. 

+4  +1  +3  -4  +5  +1 

Ann*  knew*  men  I whom  Ron*  knew. 


99 


Report  I'io,  PX  11963 


UN  IV  AC 


SUBSET  l 1 A.  Relative  Clauses 
PSS  L119  - S S S S S,  T148 
PSS  Ll 20  - S S S U S S,  T149 

PSS  L121  - S S S U S S,  T149 

PSS  L122  - S S S (US)  U S S,  T150 

PSS  L123  - S S S (US)  S S,  T150 

PSS  L124  - S S S (US)  US  (US),  T151 


+5  +3  +3  +5  +i 

Ann*  knew*  men*  Ron*  knew. 


— 1 

+5  +2 

+5  -2 

+5 

+1 

A |nn*  knew*  May,  | whom 

Ron* 

knew. 

+b  +2 

+5  . -3  . 

+5  +1  +5 

Ann*  knew* 

May  | whom  | _ 

Ron  knew  1 too 

+5  +4 

, +5  , -4 

, +5 

, -2 

Ron*  loaned  | Lynn  1 the  I oil 

. 1 which 

+5 

+4 

Wayne*  owned. 

+5  +4 

, +5  -4 

i +4 

+4 

Ron*  loaned  1 Lynn  the 

loil  1 

Wayne* 

+5  , 

owned . 

+5  +3 

, +4  -3 

. +5  i 

-2 

Ron*  loaned  I Lynn  the 

1 oilj 

which 

i +4  ~2i 

i +4 

I Wayne  be 

moaned. 

100 


Report  No.  PX  11 963 


UNIVAC 


APPENDIX  B:  PRESENTATIONS  AND  PUBLICATIONS  BY  SPERRY  DNIVAC 

DURING  THE  ARPA/SUR  PROGRAM 

The  fcllot Ing  is  a complet.  list  of  Sperry  Uni vac  publications,  reports, 
presentations,  and  unpublished  L 3 A SJR  Notes,  resulting  from  ARPA  funding 
during  the  SLrt  program: 

Publications  and  Reports 

TEA,  V.  A.,  MEDRESS,  M.  F.,  and  SKINNER,  1.  E. , Prosodic  Aids  to  Speech  Recogni- 
tion: I.  Basic  Algorithms  and  Stress  StudJ.es,  Uni vac  Report  No,  PI  7940.  Uni vac 
Park,  St.  Paul,  Minnesota,  October  1972. 

LEA,  V.  A.,  Influences  of  Phonetic  Sequences  and  Stress  on  Fundamental  Frequency 
Contours  of  Isolated  Words,  J.  Ac oust.  Soc.  of  America.  Vol.  53,  January,  1973, 
346(A) . 

LEA,  V.  A.,  MEDRESS,  M.  F. , and  SKINNER,  T.  E.,  Use  of  Syntactic  Segmentation  and 
Stressed  Syllable  Location  in  Fhonemic  Recognition,  J.  Acoust.  Soc.  of  America. 

Vol.  53,  January,  1973,  356(A). 

LEA,  W.A.,  Syntactic  Boundaries  and  Stress  Patterns  in  Spoken  English  Texts,  Uni vac 
Report  No.  PX  10146.  Uuivac  Park,  St.  Paul,  Minnesota,  March,  1973- 

LEA,  W.  A.,  MEDRESS,  K.  F. , and  SKINNER,  T.  E.,  Prosodic  Aid  • o Speech  Recogni- 
tion: II.  Syntactic  Segmentation  and  Stressed  Syllable  Locate  Uni vac  Report  Nc. 

PI  10232.  Uuivac  Park,  St.  Paul,  Minnesota,  April,  1973- 

LEA,  W.  A.,  MEDRESS,  M.  F. , and  SKINNER,  T.  E.,  Prosodic  Aids  to  Speech  Recogni- 
tion: III.  Relationships  between  Stress  and  Phonemic  Recognition  Results,  Uni vac 
Report  No.  FI  10430.  Univac  Park,  St.  Paul,  Minnesota,  September,  1973- 

LEA,  W.  A.,  MEDRESS,  M.  F.,  and  SKINNER,  T.  E.,  A Prosodically-Guided  Speech 
Understanding  Strategy,  Proc.  IEEF  IvTiroosium  on  Speech  Recognition.  Carnegie- 
Melloi.  University,  Pittsburgh,  Penn.,  April,  1 974,  38-44. 

LEA,  W.  A.  Prosodic  Aids  to  Speech  Recognition  IV:  A General  Strategy  for  Prosodi- 
cally-Guided Speech  Understanding,  Univac  Report  No.  PI  10791.  Univac  Park,  St. 
Paul,  Minnesota,  March,  1974. 

LEA,  V.  A.,  Sentences  for  Controlled  Testing  of  Acoustic  Phonetic  Components  of 
Speech  Understanding  Systems,  Univac  Report  PI  " '952.  Univac  Park,  St.  Paul, 
Minnesota,  September,  1974. 

LEA,  W.  A.,  Prosodic  Aids  to  Speech  Recognition.  V.  A Summary  of  Results  to  Date, 
Univac  Report  No.  PI  11087.  Univac  Park,  St.  Paul,  Minnesota,  October,  1974. 

LEA,  V.  A.,  MEDRESS,  M.  F.,  and  SKINNER,  T.  E.,  A Prosodically-Guided  Speech  Under- 
standing Strategy,  IEEE  Trans,  on  Acoustics.  Speech,  and  Signal  Procesc  ag,  vol. 
ASSP-23,  February,  1975,  30-38. 


101 


Report  No.  PX  11963 


UNIVAC 


LEA,  W.  A.  and  KLOKER,  D.  R.,  Prosodic  Aids  to  Speech  Recognition:  VI.  Timing 
Cues  to  Linguistic  Structure  and  Improved  Computer  Programs  for  Prosodic  Analysis, 
Pnivac  Report  No.  PX  11239.  Uni vac  Park,  St.  Paul,  Minnesota,  March,  1975* 

LEA,  V.  A.,  Prosodic  Aids  to  Speech  Recognition:  VII.  Experiments  on  Detecting 

and  Locating  Phrase  Boundaries,  Univac  Report  No.  PX  11 534.  Univac  Park,  St.  Paul, 
Minnesota,  November,  1975* 

LEA,  W.  A.,  Ac1'  ,-itic  Correlates  of  Stress  and  Juncture,  Univac  Report  No.  PX  Vlu93 
Univac  Park,  St.  Paul,  Minnesota,  .Tune,  1976.  To  appear  in  Stress  and  Accent 
(L.  Hyman,  Ed.),  University  of  Southern  California  Press,  Los  Angeles. 

LEA,  W.  A.,  The  Importance  of  Prosodic  Analysis  in  Speech  Understanding  Systems, 
Univac  Report  No.  PX  11694.  Univac  Park,  St.  Paul,  Minnesota,  June,  1976, 

Submitted  to  IEEE  Trans.  Acoustics,  Speech  and  Signal  Processing. 

LEA,  V.  A.,  Prosodic  Aids  to  Speech  Recognition:  VIII.  Listeners'  Perceptions  of 
Selected  English  Stress  Patterns,  Univac  Report  No.  PX  11711.  Univac  Park,  St.  Paul, 
Minnesota,  June,  1976. 

LEA,  V.  A.,  Sentences  and  Hypotheses  for  Controlled  Testing  of  Syntactic  and 
Prosodic  Components  of  Speech  Understanding  Systems,  Univac  Report  No.  PX  10993 
Univac  Park,  St.  Paul,  Minnesota,  November,  1976. 

LEA,  W.  A.  Prosodic  Aids  to  Speech  Recognition:  IX.  Acoustic-Prosodic  Patterns 
in  Selected  English  Phra3e  Structures,  Univac  Report  No.  PX  11963.  Univac  Park, 

St.  Paul,  Minnesota,  December,  1976. 


Qr.al_Prgsentations 

KLOKER,  D.  R.  (April  1975)*  "Vowel  and  Sonorant  Lengthening  as  Cues  to  Phonological 
Phrase  Boundaries."  presented  at  the  89th  Meeting  of  the  Acoustical  Society 
of  America,  Austin,  Texas. 

KLOKER,  D.  R.  (April  1976).  "A  Technique  for  the  Automatic  Location  and 
Description  of  Pitch  Contours,''  presented  at  the  1976  International  Conference 
on  Acoustics,  Speech  and  Signal  Processing,  Philadelphia,  Pennsylvania. 


LEA,  W.  A.,  Influences  of  Phonetic  Sequences  and  Stress  on  Fundamental  Frequency 
Contours  of  Isolated  Words,  presented  at  the  84th  Meeting  of  the  Acoustical  Society 
of  America,  Miami  Beach,  Florida,  November,  1972 

LEA,  W.  A#,  MEDRESS,  M.  F.,  and  SKINNER,  T.  E.  (November  1972).  "Use  of  Syntactic 
Segmentation  and  Stressed  Syllable  Location  in  Phonemic  Recognition," 
presented  at  84th  Meeting  of  the  Acoustical  Society  of  America,  Miami  Beach, 
Florida. 

LEA,  W.  A.,  Prosodic  Features  and  Linguistic  Structure,  presented  at  the  ARPA 
Tutorial  Lectures  on  Acoustic-Phonetic  Characteristics  of  English  Sentences, 
Cambridge,  Massachusetts,  December,  1 972 . 


102 


Report  No.  PX  11963 


UNIVAC 


LEA,  V A.,  MEDRESS,  H.  F. , and  SKINNER,  T.  E.,  A Prosodically  Guided  Speech 
Understanding  Strategy,  presented  at  IEEE  Symposium  on  Speech  Recognition, 
Camegie-Mellon  University,  Pittsburgh,  PA.,  April,  1974. 

LEA,  V.  A.,  "Perceived  Stress  as  the  'Standard'  for  Judging  Acoustical  Correlates 
of  Stress  , presented  at  the  86th  Meeting  of  the  Acoustical  Society  of  America, 
Los  Angeles,  California,  November,  1973.  * 


LEA,  W.  A.,  "Evidence  that  Stressed  Syllables  Are  the  Most  Readily  Decoded 
Portions  of  Continuous  Speech" , presented  at  the  86th  Meeting  of  the  Acoustical 
Society  of  America,  Los  Angeles,  California,  November,  1973* 

TEA,  V.  A.,  "An  Algorithm  for  Locating  Stressed  Syllables  in  Continuous  Speech", 
presented  at  the  86th  Meeting  of  the  Acoustical  Society  of  America,  Los  Angeles, 
California,  November,  1973* 

LEA,  W.  A.,  Prosodic  Phenomena,  session  chaired  at  ARPA  Phonological  Rules  Work- 
shop, Systems  Development  Corporation,  Santa  Monica,  California,  June,  197^* 

LEA,  W.  A. , A Speech  Data  Base  for  Testing  Components  of  Speech  Understanding 
Systems,  pr  jented  at  88th  Meeting  of  the  Acoustical  Society  of  America,  St.  Louis, 
Missouri,  November,  1974. 

LEA,  W.  A.,  Prosodic  Hypotheses  and  Rules,  presented  at  the  ARPA  Workshop  on 
Acoustic  Phonetic  and  Phonological  Rules,  Bolt  Beranek  and  Newman,  Cambridge, 
Massachusetts,  November,  197^- 

LEA,  W.  A.,  Isochrony  and  Disjuncture  as  Aids  to  Syntactic  and  Phonological 
Analysis,  presented  at  the  89th  Meeting  of  the  Acoustical  Society  of  America, 
Austin,  Texas,  April,  1975.  Abstract  in  J.  AcouBt.  Soc.  America.  Vol.  57, 

Suppl.  No.  1,  Spring,  1 975 • 

LEA,  W.  A.,  Acoustic  Correlates  of  Stress  and  Juncture:  A systematic  Testing 

of  Alternative  Hypotheses,  presented  at  the  Symposium  on  Stress  and  Accent, 
University  of  Southern  California,  February,  1976. 

LEA,  W.  A.,  Stress  om  English:  Listeners'  Perceptions  and  Acoustic  Correlates, 
presented  tc  the  Linguistics  Club,  University  of  Minnesota,  Minneapolis,  May, 

1976. 


LEA,  W.  A.,  Perceived  Stress  Patterns  in  Selected  English  Phrase  Structures, 
presented  to  the  American  Asboc.  of  Phonetic  Sciences.  San  Diego,  California, 
November  15,  1976. 

LE4.,  W.  A.,  Use  of  International  Phrase  Boundaries  to  Select  Syntactic  Hypotheses 
in  a Speech  Understanding  System,  presented  at  the  92nd  Meeting,  Acoustical 
Society  of  America,  San  Diego,  California,  November  16,  1976.  J.  Acoust.  Soc. 
of  America,  vol.  60,  Suppl.  1,  Page  SI 2 . 


103 


Report  Mo.  PX  11 963 


UNIVAC 


Unpublished  ARPA  SUR  Notes 

2.  MEDRESS,  M.  F. , The  Univac  Speech  Recognition  Study  (5  pages),  December, 

1971. 

16.  MEDRESS,  M.  F. , Proposed  Computer  Phonetic  Transcriptions  (2  pages), 

February,  1972. 

17.  MEDRESS,  M.  F. , Univac  Speech  Bibliography  (1  page),  February,  1972. 

32.  MEDRESS,  M.  *. , Revised  Computer  Phonetic  Representations  (2  pages),  April, 

1972. 

36.  MEDRESS,  M.F.,  et  al.,  "Acoustic  Correlates  of  Linguistic  Stress"  (Literature 
Survey,  (22  pages)),  June,  1972. 

39*  LEA,  W.  A.,  Acoustic  Cues  for  Boundaries  between  Syntactic  Units  (6  pages), 
August , 1 972 • 

45.  LEA,  W.  A.,  Considerations  in  the  Design  of  Good  Speech  Texts  (7  pages), 
September,  1972. 

48.  MEDRESS,  M.F.,  Plans  for  the  First  Segment  of  the  Speech  Data  Base  (2  pages), 
October,  1972. 

53-  LEA,  W.  A.,  MEDRESS,  M.  F.,  and  SKINNER,  T.E.,  Use  of  Syntactic  Segmentation 
and  Stressed  Syllable  Location  in  Phonemic  Recognition  (11  pages),  December, 
1972. 

54.  LEA,  W.  A.,  Syntactic  Factors  in  the  Initial  Selection  of  Sentences  for  the 
Data  Base  (7  pages),  December,  1972. 

63.  LEA,  W.  A.,  Some  Factors  in  the  Selection  of  Utterances  for  Speaker  Normaliza- 
tion (3  pages),  February,  1 973 • 

67.  LEA,  W.  A.,  Acoustic  Analysis  of  13  ARPA  Sentences  (4  pages),  February,  1 973 - 

82.  LEA,  W.  A.,  MEDRESS,  M.  F.,  and  SKINNER,  T.E.,  Univac  Final  Technical  Report: 
Prosodic  Aids  to  Speech  Recognitionll:  Syntactic  Segmentation  and  Stressed 

Syllable  Location  (34  pages),  May,  1 973 • 

108.  LEA,  ¥.  A.,  MEDRESS,  M.  F.,  and  SKINNER,  T.E.,  Prosodic  Aids  to  Speech 
Recognition:  III.  Relationships  between  Stress  and  Phonemic  Recognition 
Results  (6  pages) , October,  1 973 • 

139*  MEDRESS,  M.  F. , Prosodic  Aids  to  Speech  Recognition:  ^V,  A General  Strategy 
for  Prosodically-Guided  Speech  Understanding  (65  pages),  May,  1974. 

l4l . LEA,  W.  A.,  Sentences  for  Testing  Acoustic  Phonetic  Components  of  Systems 
(18  pages),  July,  1974. 

154.  LEA,  ¥.  A.,  Sentences  for  Controlled  Testing  of  Acoustic  Phonetic  Components 
of  Speech  Understanding  Systems  (4l  pages),  November,  1974. 


104 


Report  No.  PI  11 963 


UNIVAC 


155. 


m,  V.  A.,  Sentences  for  Testing  Prosodic  and  Syntactic  Components  of 
Systems  (52  pages),  November,  1974. 


156. 


LEA,  V.  A.,  Prosodic  Aids  to  Speech  Recognition:  V. 
to  Data  (29  pages),  January,  1975- 


A Summary  of  Results 


162. 


LEA,  V.  A., 
Constituents 


An  Improved  Program  for  Detecting  Boundaries  Between  Syntactic 
(6  pages),  March,  1975. 


1 63 • LEA,  W.  A. , A Computer  Program  for  Locating  Stressed  Syllables  (27  pages), 

March,  1975- 

182.  LEA,  W.  A.,  Prosodic  Algorithms  and  Studies  of  Timing  Cues  to  Linguistic 
Structure  (5y  pages),  August,  1975* 

197*  LEA,  W.  A.,  Experiments  on  Detecting  and  Locating  Phrase  Boundaries,  (43  pages), 
January,  1976. 

206.  LEA,  W.  A.,  Two  Papers  about  Prosodic  Structures  (131  pages),  June,  1976. 

213*  LEA,  W.  A.,  Two  Studies:  Listeners'  Perceptions  of  SteBS  and  Prosodic  Aids 
to  the  BBN  Parser  (70  pages),  July,  1976. 


105 


Onclasaifled 

So*  tintv  C*1b  **t  first  ton 


DOCUMENT  CONTROL  DATA  • R & D 


^ ufi  ir  / a « tin  anon  r f 1 f 1 a,  tnxJy  of  ar>  » trmc  f ana  mu#  <<04  unnofrtfjon  mu  % f 6*  at 

1 1 Kiel...  nc  Ac  T 1 V t 1 v ( Corporal a Author} 

Univac  Defense  Systems  Division 

P.  0.  Box  3525 

St.  Paul,  Minnesota  551 65 

ilarr*  whin  t hi  ovarau  taport  la  c laaalnad) 
21.  RIOQRT  JECURlTV  CLASSIFICATION 

Unclassified 

26.  GROUP 

— \ - ..j 

u Prosodic  Aids  to  Speech  Recognitiont\TX«  Acoustic  Prosodic  Patterns  in  Selected 
English  Phrase  Structures  t ...  . - — . — — . — — — — 1 

r I 

"*  *°°‘°  r...*  --J 1 

Final  ^Fechnlcal  &'epwt»  1 Sept— BE?  1E75  - 3y 


Nov# 


uhim  TTTfTTnTSreT  mAmrmrin7ii&n?w' 
T 


W6j\ 


/ 
( 


REROR  ■ »_•  A T E 

?«.  totau  no  of  pages 

7b.  NO  of  REFS 

lTDec«—r  1976 



105  + vii 

36 

7 


dshc j ;-73-c-«3i< 


9*.  ORIGIN*  fOR^  F'EPPRT  NUMB 


7 [ 


Uni vac  Report  NoTjPX-11963 


7 - 


96.  OTHER  REPORT  NOlSI  (Any  othar  numbar a f/iaf  map  &•  • ••/flnsd 
fh/#  rmport) 

None 


**  OISTRIBUTION  STATEMENT 

Distribution  of  this  document  is  unlimitet 
1 

i 

11  supp 

13  »OST 

LEMENTARY  NOTES 
i A C T 

12  SPONSORING  MILITARY  ACTIVITY 

Advanced  Research  Projects  Agency 
1400  Wilson  Boulevard 
Arlington,  Virginia  22209 

prosodic  aids  to  speech  recognition.  A procedure  for  using  intonational  phrase 
boundaries  to  select  among  alternative  word  and  phrase  hypotheses  has  been  developed, 
refined,  and  tested  by  hand  analyses  of  sixteen  sentences.  This  procedure  was 
des  .gned  for  use  with  the  BBN  HWIM  speech  understanding  system,  and  was  totally 
implemented,  but  not  tested  before  the  end  of  the  BBN  contract  with  ARPA.  Compari- 
sons of  control  and  parsing  traces  with  acoustically  detected  pb  .se  boundaries  did 
show,  however,  that  intonational  boundaries  could  help  select  correct  words  and 
phrase  structures  and  avoid  erroneous  hypotheses. 

An  experimental  study  of  acoustic  prosodic  patterns  in  255  sentences  showed 
several  useful  prosodic  regularities.  Over  91 % of  the  syllables  were  correctly 
located,  and  92%  of  the  stressed  syllables  were  correctly  categorized  as  stressed, 
while  76%  of  the  syntactic  phrase  boundaries  were  detected.  Exactly  which  phrases 
are  or  are  not  preceded  by  intonational  phrase  boundaries  was  determined.  Intcnatioij 
contours  were  very  firmly  shown  to  involve  rising  pitch  until  the  first  stress, 
progressively  lower  pitch  in  succeeding  stresses,  and  a terminal  fall  (for  declara- 
tives, commands,  and  WH  questions)  or  rise  (for  yes/no  questions).  Parantheticals 
were  clearly  marked  by  dis junctures,  large  Fo  variations,  and  other  prosodic  feature^ 
Contrastive  phrase  structures  could  be  detected  from  prosodic  cues.y 

A summary  of  Sperry  Univac's  total  contributions  to  ARPA/SUR  snows  efforts  to 
define  the  importance  of  prosodies  in  speech  understanding,  to  cooperate  with  other 
contractors,  and  to  conduct  experiments  on  all  aspects  of  prosodic  structure. 

Further  work  is  suggested. 


DD 


FORM 


1473 


Unclassified 

Securitv  Classification 


73 


lecurtty  Cl*«tific*t»on 


k kv  wo  HO* 

Speech  Recognition 
Speech  Analysis 
Linguistic  Stress 
Prosodies 

Prosodic  Features  Extraction 
Intonation 

Syntactic  Boundary  Detection 
Stressed  Syllable  Location 
Syntactic  Analysis 
Syntactic  Parsing 


